{"id":25977600,"url":"https://github.com/nylander/catfasta2phyml","last_synced_at":"2025-07-22T14:34:03.685Z","repository":{"id":6031176,"uuid":"7255331","full_name":"nylander/catfasta2phyml","owner":"nylander","description":"Concatenates FASTA formatted files to one \"phyml\" (PHYLIP) formatted file","archived":false,"fork":false,"pushed_at":"2024-09-30T14:44:28.000Z","size":71,"stargazers_count":68,"open_issues_count":0,"forks_count":22,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-07T14:52:33.457Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nylander.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2012-12-20T09:39:56.000Z","updated_at":"2025-03-12T07:52:30.000Z","dependencies_parsed_at":"2023-01-11T16:55:54.203Z","dependency_job_id":"c9149ecb-2816-4bc9-b12a-b346d8705c2a","html_url":"https://github.com/nylander/catfasta2phyml","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/nylander/catfasta2phyml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2Fcatfasta2phyml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2Fcatfasta2phyml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2Fcatfasta2phyml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2Fcatfasta2phyml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nylander","download_url":"https://codeload.github.com/nylander/catfasta2phyml/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nylander%2Fcatfasta2phyml/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266510686,"owners_count":23940696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-05T04:38:41.229Z","updated_at":"2025-07-22T14:34:03.659Z","avatar_url":"https://github.com/nylander.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"# catfasta2phyml\n\n### NAME\n\n`catfasta2phyml.pl` -- Concatenate FASTA alignments to PHYML, PHYLIP, or FASTA\nformat\n\n### SYNOPSIS\n\n    catfasta2phyml.pl [options] [files]\n\n### OPTIONS\n\n- **-h, -?, --help**\n\nPrint a brief help message and exits.\n\n- **-m, --man**\n\nPrints the manual page and exits.\n\n- **-c, --concatenate**\n\nConcatenate files even when number of taxa differ among alignments. Missing\ndata will be filled with all gap (-) sequences.\n\n- **-i, --intersect**\n\nConcatenate sequences for sequence labels occuring in all input files\n(intersection).\n\n- **-f, --fasta**\n\nPrint output in FASTA format (default is PHYML format).\n\n- **-p, --phylip**\n\nPrint output in a strict PHYLIP format. See section \"Data file format\" on \n[https://phylipweb.github.io/phylip/doc/main.html#inputfiles](https://phylipweb.github.io/phylip/doc/main.html#inputfiles)\n\n**Note:** The current output is not entirely strict for the interleaved format.\nLeft to do is to efficiently print sequences in blocks of 10 characters. The\nsequential PHYLIP format works, on the other hand (use **-s** in combination\nwith **-p**).\n\n- **-s, --sequential**\n\nPrint output in sequential format (default is interleaved).\n\n- **-b, --basename=suffix**\n\nEnsure the basename is used as partition definition. If the provided **suffix**\n(required) matches the file suffix, it will be removed from the output string.\n\n**Note:** If the suffix it to be kept, one may use this format: **--basename='\n'** (basically providing a string that will not match the file suffix).\n\n- **-v, --verbose**\n\nBe verbose by showing some useful output. See the combination with **-n**.\n\n- **-n, --noprint**\n\nDo not print the concatenation, just check if all files have the same sequence\nlables and lengths. Program returns 1 on exit. See also the combination with\n**-v**.\n\n- **-V, --version**\n\n    Print version number and exit.\n\n### DESCRIPTION\n\n**catfasta2phyml.pl** will concatenate FASTA alignments to one file\n(interleaved PHYML or FASTA format) after checking that all sequences\nare aligned (of same length). If there are sequence labels that are not\npresent in all files, a warning will be issued. Sequenced can, however,\nstill be concatenated (and missing sequences be filled with missing data\n(gaps)) if the argument **--concatenate** is used.\n\nIn addition, only sequences with sequence labels present in all files\n(the intersection) can be printed using the **--intersect** argument.\n\nThe program prints the concatenated data to **STDOUT**. A table with\ninformation about partitions is printed to **STDERR**. Example: \n\n    file1.fas = 1-625\n    file2.fas = 626-1019\n    file3.fas = 1020-2061\n    file4.fas = 2062-3364\n    file5.fas = 3365-3796\n\nSee below for how this table can be used in other software (e.g., IQ-Tree,\nRAxML-ng).\n\n### USAGE\n\nTo concatenate fasta files to a phyml readable format:\n\n    $ catfasta2phyml.pl file1.fas file2.fas \u003e out.phy\n    $ catfasta2phyml.pl *.fas \u003e out.phy 2\u003e partitions.txt\n    $ catfasta2phyml.pl --sequential *.fas \u003e out.phy\n    $ catfasta2phyml.pl --verbose *.fas \u003e out.phy\n\nTo concatenate fasta files to fasta format:\n\n    $ catfasta2phyml.pl -f file1.fas file2.fas \u003e out.fasta\n    $ catfasta2phyml.pl -f *.fas \u003e out.fasta\n\nTo check fasta alignments:\n\n    $ catfasta2phyml.pl --noprint --verbose *.fas\n    $ catfasta2phyml.pl -nv *.fas\n    $ catfasta2phyml.pl -n *.fas\n\nTo concatenate fasta files, while filling in missing taxa:\n\n    $ catfasta2phyml.pl --concatenate --verbose *.fas\n\nTo concatenate sequences for sequence labels occuring in all files:\n\n    $ catfasta2phyml.pl --intersect *.fas\n\nTo ensure basename as name and suffix removal in partition definition:\n\n    $ catfasta2phyml.pl -b.fas dat/file1.fas dat/file2.fas \u003e out.phy\n\n### TIPS\n\n**1. \"Argument list too long\" error?**\n\nIf we run into the issue of \"Argument list too long\" (where we have a command\nline longer than allowed on our system (`getconf ARG_MAX`) - which may happen\nif we try to concatenate many files), we can still do it, but in steps. For\nexample (here with some help of [GNU\nparallel](https://www.gnu.org/software/parallel/)):\n\n    $ catfasta2phyml.pl -c $(find . -type f -name '*.ali') \u003e concatenated.phy 2\u003e/dev/null\n    -bash: catfasta2phyml.pl: Argument list too long\n\nInstead, start by concatenating to intermediate files using GNU parallel\n\n    $ find . -type f -name '*.ali' | \\\n          parallel -N1000 'catfasta2phyml.pl -c -f '\"{}\"' \u003e tmp.'\"{#}\"'.conc'\n\nThen concatenate the intermediate files to one\n\n    $ catfasta2phyml.pl -c tmp.*.conc \u003e concatenated.phy 2\u003e/dev/null\n    $ rm tmp.*.conc\n\n\n**2. Prepare a RAxML-style partitions file**\n\nCatfasta2phyml does not check what data type (DNA, PROTEIN, etc) that is being\nconcatenated. It only checks the sequence labels and sequence lengths. When\nrunning catfasta2phyml, a list of partition names and relative positions are\nwritten to standard error.  A partition file (for, e.g.,\n[IQ-Tree](http://www.iqtree.org/) and\n[RAxML-ng]((https://github.com/amkozlov/raxml-ng)) does require, however, a\ndata type to be given in front of the partition specification. Assuming that we\nare concatenating the same kind of data type, the preparation of a partitions\nfile is straightforward.  Below is an example using `sed` (GNU Linux). Let us\nalso assume that we gave the full path to the input files (which prints the\npath in the output partition table), and that the data type is \"DNA\":\n\n    $ catfasta2phyml.pl -c dat/*.fas \u003e out.phy 2\u003e partitions.txt\n    $ cat partitions.txt\n    dat/file1.fas = 1-625\n    dat/file2.fas = 626-1019\n    dat/file3.fas = 1020-2061\n    dat/file4.fas = 2062-3364\n    dat/file5.fas = 3365-3796\n\nWe can now remove the `dat/` and the `.fas`, and add `DNA, ` on each line:\n\n    $ sed -i -e 's#dat/##' -e 's/\\.fas//' -e 's/^/DNA, /' partitions.txt\n    $ cat partitions.txt\n    DNA, file1 = 1-625\n    DNA, file2 = 626-1019\n    DNA, file3 = 1020-2061\n    DNA, file4 = 2062-3364\n    DNA, file5 = 3365-3796\n\n\n**3. But I want to split, not concatenate!**\n\nFacing the \"opposite\" situation (having a large concatenated fasta file that\nyou want to split into individual alignments)? If you have a corresponding\npartitions file, you may give FastEAR a try\n([https://github.com/nylander/FastEAR](https://github.com/nylander/FastEAR))!\n\n\n\n### AUTHOR\n\nWritten by Johan A. A. Nylander\n\n### DEPENDENCIES\n\nUses Perl modules Getopt::Long and Pod::Usage\n\n### LICENSE AND COPYRIGHT\n\nCopyright (c) 2010-2024 Johan Nylander\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n\n### DOWNLOAD\n\n\u003chttps://github.com/nylander/catfasta2phyml\u003e\n\n\n### INSTALL WITH CONDA\n\n    $ conda install -c bioconda catfasta2phyml\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnylander%2Fcatfasta2phyml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnylander%2Fcatfasta2phyml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnylander%2Fcatfasta2phyml/lists"}