{"id":39349081,"url":"https://github.com/yodeng/srautils","last_synced_at":"2026-01-18T02:26:48.610Z","repository":{"id":62985409,"uuid":"563708463","full_name":"yodeng/srautils","owner":"yodeng","description":"fast utils for fetch and dump NCBI SRA archive raw data","archived":false,"fork":false,"pushed_at":"2023-05-25T03:42:15.000Z","size":24,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-11T12:51:46.662Z","etag":null,"topics":["fastq-dump","ncbi-sra","sra-data","sra-toolkit","sratoolkit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yodeng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-11-09T06:54:50.000Z","updated_at":"2023-05-17T08:15:27.000Z","dependencies_parsed_at":"2023-01-31T21:01:24.542Z","dependency_job_id":null,"html_url":"https://github.com/yodeng/srautils","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/yodeng/srautils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Fsrautils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Fsrautils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Fsrautils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Fsrautils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yodeng","download_url":"https://codeload.github.com/yodeng/srautils/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Fsrautils/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28526569,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastq-dump","ncbi-sra","sra-data","sra-toolkit","sratoolkit"],"created_at":"2026-01-18T02:26:48.075Z","updated_at":"2026-01-18T02:26:48.600Z","avatar_url":"https://github.com/yodeng.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# srautils\n\n[![PyPI version](https://img.shields.io/pypi/v/srautils.svg?logo=pypi\u0026logoColor=FFE873)](https://pypi.python.org/pypi/srautils)\n\nsrautils is a program used for download and dump NCBI SRA archive raw fastq data. It provides a fast and easy way to fetch sra data and convert sra file into fastq/fasta sequence data for our scientific research.\n\n### 1. Requirement\n\n+ Linux64\n+ python \u003e=3.8\n+ [sratoolkit](https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-Toolkit)\n\n### 2. Install\n\nThe latest release can be installed with\n\n\u003e pypi:\n\n```shell\npip3 install srautils -U\n```\n\nThe development version can be installed with (for recommend)\n\n```shell\npip3 install git+https://github.com/yodeng/srautils.git\n```\n\n### 3. Usage\n\nsrautils include `srautils fetch` and `srautils dump` sub-commands. \n\n```\n$ srautils -h \nusage: srautils [-h] [-v] command ...\n\nfast utils for fetch and dump SRA archive raw fastq data\n\npositional arguments:\n  command\n    fetch        fetch raw sra data by SRA accession id\n    dump         dump sra into fastq/fasta sequence file\n\noptional arguments:\n  -h, --help     show this help message and exit\n  -v, --version  show program's version number and exit\n```\n\n#### 3.1 srautils fetch\n\nThe `fetch` command is used for download SRA file by only giving an accession SRA id, it's a rapid and interruptable download accelerator.\n\nAll original SRA files are obtained directly from AWS Cloud with `UNSIGNED` access. This tools split the whole download into many pieces and record the progress of each chunk in a `*.ht` binary file, this can significantly speed up the download. Auto resume can be running by loading the progress file if any interruption. Command help as follows:\n\n```\n$ srautils fetch -h \nusage: srautils fetch [-h] -i \u003cstr\u003e [-o \u003cstr\u003e] [-n \u003cint\u003e] [-s \u003cstr\u003e]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i \u003cstr\u003e, --id \u003cstr\u003e  input sra-id, SRR/ERR/DRR allowed, required\n  -o \u003cstr\u003e, --outdir \u003cstr\u003e\n                        output sra directory, current dir by default\n  -n \u003cint\u003e, --num \u003cint\u003e\n                        the max number of concurrency, default: auto\n  -s \u003cstr\u003e, --max-speed \u003cstr\u003e\n                        specify maximum speed per second, case-insensitive unit support (K[b], M[b]...), no-limited by default\n```\n\n| options        | descriptions                                                 |\n| -------------- | ------------------------------------------------------------ |\n| -h/--help      | show this help message and exit                              |\n| -i/--id        | input valid accession SRA id                                 |\n| -o/--outdir    | output directory                                             |\n| -n/--num       | the max number of concurrency, auto detect by sra file size  |\n| -s/--max-speed | maximum speed per second, case-insensitive unit support (K[b], M[b]...), no-limited by default |\n\n![fetch](https://user-images.githubusercontent.com/18365846/201565539-4df7ee9e-0a44-4786-8a90-1fd2e78d4ab5.png)\n\n#### 3.2 srautils dump\n\nThe `dump` command is a parallel `fastq-dump` wrapper which used for dump SRA file and get the raw `fastq/fasta` sequence data as output. \n\nNCBI `fastq-dump` is very slow,  even if you have high machine resources (network, IO, CPU). This tool speeds up the process by dividing the work into multiple jobs and runing all chunked jobs parallelly in localhost or sge cluster (default) environment. After chunk jobs finished, all resuslts will be concatenated together. The command usage below here:\n\n```\n$ srautils dump -h \nusage: srautils dump [-h] -i \u003cfile\u003e [-p \u003cint\u003e] [-l \u003cfile\u003e] [--local] [-o \u003cdir\u003e] [--no-gzip] [--fasta] [-q [\u003cstr\u003e ...]]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i \u003cfile\u003e, --input \u003cfile\u003e\n                        input sra file, required\n  -p \u003cint\u003e, --processes \u003cint\u003e\n                        number of dumps processors, 10 by default\n  -l \u003cfile\u003e, --log \u003cfile\u003e\n                        append srautils log info to file, stdout by default\n  --local               run sra-dumps in localhost instead of sge\n\noutput arguments:\n  -o \u003cdir\u003e, --outdir \u003cdir\u003e\n                        output directory, current dir by default\n  --no-gzip             do not compress output\n  --fasta               output fasta only\n\nsge arguments:\n  -q [\u003cstr\u003e ...], --queue [\u003cstr\u003e ...]\n                        sge queue, multi-queue can be sepreated by whitespace, all.q by default\n```\n\n| options      | descriptions                                                 |\n| ------------ | ------------------------------------------------------------ |\n| -h/--help    | show this help message and exit                              |\n| -i/--input   | input sra file                                               |\n| -p/--process | divide chunks number, 10 by default                          |\n| -l/--log     | process logging file, stdout by default                      |\n| --local      | running all chunked jobs in localhost instead of sge cluster |\n| -o/--output  | output directory                                             |\n| --no-gzip    | do not gzip output, gzip output by default                   |\n| --fasta      | output fasta instead of fastq                                |\n| -q/--queue   | running all chunked jobs in sge queue if set,  `all.q` by default |\n\n![dump](https://user-images.githubusercontent.com/18365846/201566132-b3d8e0d3-426e-44f5-b9d6-6db58020dbff.png)\n\n### 4. License\n\n`srautils` is distributed under the [MIT License](https://github.com/yodeng/srautils/blob/master/LICENSE).\n\n### 5. Reference\n\n+ [NIH NCBI Sequence Read Archive (SRA) on AWS](https://registry.opendata.aws/ncbi-sra/)\n+ [ncbi/sra-tools](https://github.com/ncbi/sra-tools)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyodeng%2Fsrautils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyodeng%2Fsrautils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyodeng%2Fsrautils/lists"}