{"id":25977594,"url":"https://github.com/nylander/fastear","last_synced_at":"2025-03-05T04:38:39.889Z","repository":{"id":146538558,"uuid":"258887648","full_name":"nylander/FastEAR","owner":"nylander","description":"FastEAR - Fast(er) Extraction of Alignment Regions from FASTA","archived":false,"fork":false,"pushed_at":"2024-09-11T12:54:09.000Z","size":35,"stargazers_count":1,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-09-11T19:54:46.951Z","etag":null,"topics":["faidx","fasta","samtools"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nylander.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-25T22:32:42.000Z","updated_at":"2024-09-11T12:55:54.000Z","dependencies_parsed_at":"2024-06-13T05:05:17.442Z","dependency_job_id":null,"html_url":"https://github.com/nylander/FastEAR","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2FFastEAR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2FFastEAR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2FFastEAR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2FFastEAR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nylander","download_url":"https://codeload.github.com/nylander/FastEAR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241966989,"owners_count":20050324,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["faidx","fasta","samtools"],"created_at":"2025-03-05T04:38:39.260Z","updated_at":"2025-03-05T04:38:39.881Z","avatar_url":"https://github.com/nylander.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FastEAR - Fast(er) Extraction of Alignment Regions\n\n- Last modified: ons sep 11, 2024  02:53\n- Sign: JN\n\n## Description\n\nShell (bash) scripts for extracting regions from a fasta-formatted nucleotide\nalignment based on the description of ranges in a [partitions\nfile](#example-partitions-file).\n\nThe scripts act as a wrapper for the main software that performs the\nextraction: faidx. GNU parallel is used for doing the extraction in parallel.\n\nThe string in the first column of the partitions file will be used as the stem\nof the output file name, and the suffix `.fas` will be added (for example:\n`Apa.fas`). The output file will contain all fasta entries in the input fasta\nfile, but only with the sequence positions as specified in the partitions file.\n\nNote that only the first string (without white-space characters!) in the fasta\nheaders will be used in the output.\n\nCurrently two versions of the script is provided, differing in which version of\nfaidx used (see [Requirements and\nInstallation](#requirements-and-installation)).\n\n*Update*: A script using bedtools for extraction is also provided (see [Requirements and\nInstallation](#requirements-and-installation)).\n\n## Usage\n\n    $ ./fastear_\u003cversion\u003e.sh fasta.fas partitions.txt\n\nExample:\n\n    $ ./fastear_samtools-1.10.sh data/fasta.fas data/partitions.txt\n\n## Example partitions file\n\n    Apa = 1-100\n    Bpa = 101-200\n    Cpa = 201-300\n    Dpa = 301-400\n    Epa = 401-484\n\n## Requirements and Installation\n\nMake sure to install [GNU parallel](https://www.gnu.org/software/parallel/),\nand faidx. For faidx, I tried both the python version, pyfaidx\n([https://pypi.org/project/pyfaidx](https://pypi.org/project/pyfaidx)), and the\noriginal version from samtools\n([https://github.com/samtools/samtools](https://github.com/samtools/samtools)).\nSamtools v1.10 is available from, e.g., Ubuntu Linux repositories:\n\n    $ sudo apt install samtools\n\nThe syntax for samtools faidx have changed between minor samtools versions, and\nthere are two versions of the fastear-script supplied; one for samtools v1.7, and\none for v1.10 (or above. Last tested with v1.15).\n\nIn addition, if one wishes to use the \"bedtools\"-version, then `bedtools` needs\nto be installed (tested using v2.27). For example (on ubuntu):\n\n    $ sudo apt install bedtools\n\nFinally, put the fastear-script(s) in your PATH (e.g., `cp fastear_*.sh ~/bin/`).\n\n## Timings\n\nFrom a fasta file with 146 sequences, each with length 6,180,000 bp, we\nextracted 4,818 alignments (on a GNU/Linux system with two Intel Xeon Silver\n4214 CPU @ 2.20GHz, 48 cores in total):\n\n#### Using GNU parallel\n\n    #  fastear pyfaidx v.0.5.8 parallel\n    $ time fastear_pyfaidx.sh data.fas partitions.txt\n    real    0m47,499s\n    user    16m44,924s\n    sys     3m8,175s\n\n    # fastear samtools faidx v.1.7 parallel\n    $ time fastear_samtools-1.7.sh data.fas partitions.txt\n    real    0m19,714s\n    user    1m43,326s\n    sys     1m18,667s\n\n    # fastear samtools faidx v.1.10 parallel\n    $ time fastear_samtools-1.10.sh data.fas partitions.txt\n    real    0m20,172s\n    user    1m31,063s\n    sys     1m12,016s\n\n    # fastear bedtools parallel\n    $ time fastear_bedtools.sh data.fas partitions.txt\n    real    0m19,784s\n    user    1m3,445s\n    sys     0m53,333s\n\n#### Using a \"while read\"-loop over the partitions file\n\n    #  fastear pyfaidx v.0.5.8 serial\n    $ time fastear_pyfaidx.serial.sh data.fas partitions.txt\n    real    11m16,616s\n    user    9m50,814s\n    sys     2m11,124s\n\n    # fastear samtools faidx v.1.7 serial\n    $ time fastear_samtools-1.7.serial.sh data.fas partitions.txt\n    real    2m2,562s\n    user    1m33,131s\n    sys     0m53,938s\n\n    # fastear samtools faidx v.1.10 serial\n    $ time fastear_samtools-1.10.serial.sh data.fas partitions.txt\n    real    1m47,825s\n    user    1m18,883s\n    sys     0m52,152s\n\n*Conclusions*: speed of extraction is faster using parallelization.\nIn addition, different implementations differ in speed. From \nthe examples above, faidx from bedtools or samtools (v.1.10) seems\npreferable.\n\n## Disclaimer\n\nCurrently in beta version, with minimal error checking. *Caveat emptor!*\n\n## License and Copyright\n\nCopyright (C) 2020-2024 Johan Nylander \u003cjohan.nylander\\@nrm.se\u003e.\nDistributed under terms of the [MIT license](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnylander%2Ffastear","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnylander%2Ffastear","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnylander%2Ffastear/lists"}