{"id":32611442,"url":"https://github.com/ctlab/recast","last_synced_at":"2026-06-23T21:31:28.250Z","repository":{"id":78880424,"uuid":"182258990","full_name":"ctlab/RECAST","owner":"ctlab","description":"RECAST (Recipient intestinE Colonisation AnalysiS Tool)","archived":false,"fork":false,"pushed_at":"2021-09-17T18:27:10.000Z","size":66171,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-30T13:58:57.413Z","etag":null,"topics":["bioinformatics","metagenomics"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ctlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-04-19T12:07:55.000Z","updated_at":"2021-09-17T18:27:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"35b7a99c-f6cc-4525-be94-bc52300115a5","html_url":"https://github.com/ctlab/RECAST","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ctlab/RECAST","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2FRECAST","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2FRECAST/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2FRECAST/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2FRECAST/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ctlab","download_url":"https://codeload.github.com/ctlab/RECAST/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2FRECAST/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34708271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","metagenomics"],"created_at":"2025-10-30T13:57:53.946Z","updated_at":"2026-06-23T21:31:28.245Z","avatar_url":"https://github.com/ctlab.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"**RECAST (Recipient intestinE Colonisation AnalysiS Tool)** is a tool for analysing metagenome time series and distinguish which reads of one metagenome sample are found in other samples.\nThe implementation is based on [MetaCherchant](https://github.com/ivartb/metacherchant) source code.\n\n## Table of contents\n\u003c!--ts--\u003e\n  * [Installation](#installation)\n  * [Usage example](#usage-example)\n    * [Simple reads classifier](#simple-reads-classifier)\n      * [Output](#output-description)\n      * [Results visualisation](#results-visualisation)\n    * [Accurate reads classifier](#accurate-reads-classifier)\n      * [Output](#output-description-1) \n  * [Using k-mers for speed up](#using-k-mers-for-speed-up)\n  * [Citation](#citation)\n\u003c!--te--\u003e\n\n## Installation\n\nYou need to have JRE version 1.8 or higher installed, file `metacherchant.jar` and either of\nthese two files: `reads_classifier.sh` for two-category classification or `triple_reads_classifier.sh` for three-category classification.\n\nScripts have been tested under CentOS release 6.7, but should generally work on Linux/MacOS.\n\n## Usage example\n\nBoth pipelines were intended to use for analysing human gut microbiota after \n[fecal microbiota transplantation](https://en.wikipedia.org/wiki/Fecal_microbiota_transplant).\nThus it takes three metagenome samples (namely: donor sample, pre-FMT recipient sample, and post-FMT recipient sample)\nand split reads from each metagenome into different categories depending on their colonization of the recipient's gut.\n\nHowever, pipelines can be used for analysing any metagenome time series, but don't be confused with categories names.\n\n### Simple reads classifier\n\nSimple reads classifier uses hard splitting criteria and builds eight categories of reads.\n\nHere is a bash script showing a typical usage of simple reads classifier:\n\n~~~\n./reads_classifier.sh -k 31 \\\n    -d \u003cdonor_1.fasta donor_2.fasta\u003e \\\n    -b \u003cbefore_1.fasta before_2.fasta\u003e \\\n    -a \u003cafter_1.fasta after_2.fasta\u003e \\\n    -found 90 \\\n    -w \u003cworkDir\u003e \\\n    -o \u003coutDir\u003e \\\n    -corr \\\n    -m \u003cmem\u003e \\\n    -p \u003cproc\u003e \\\n    -interval95 \\\n    -v \\\n    -dk \u003cdonor.kmers.bin\u003e \\\n    -bk \u003cbefore.kmers.bin\u003e \\\n    -ak \u003cafter.kmers.bin\u003e\n~~~\n\n* `-k` — the size of k-mer used in de Bruijn graph\n* `-d` — two files with paired donor metagenomic reads. FASTA and FASTQ formats are supported, as well as compressed files *.gz or *.bz2 \n* `-b` — two files with paired pre-FMT recipient metagenomic reads. FASTA and FASTQ formats are supported, as well as compressed files *.gz or *.bz2\n* `-a` — two files with paired post-FMT recipient metagenomic reads. FASTA and FASTQ formats are supported, as well as compressed files *.gz or *.bz2\n* `-found` — Minimum coverage breadth for reads from class found \\[0 - 100 %\\] (optional, default: 90)\n* `-w` — directory with intermediate working files (optional, default: workDir)\n* `-o` — directory for final categories of reads (optional, default: outDir)\n* `-corr` — do replacement of nucleotide in read with one low quality position (optional)\n* `-m` — memory to use (for example: 1500M, 4G, etc.) (optional, default: 2 Gb)\n* `-p` — available processors (optional, default: all)\n* `-interval95` — set the interval width to probability 0.95 (optional)\n* `-v` — enable debug output (optional)\n* `-dk` — one file with donor k-mers in binary form (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n* `-bk` — one file with pre-FMT recipient k-mers in binary form (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n* `-ak` — one file with post-FMT recipient k-mers in binary form (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n\n#### Output description\n\nAfter the end of the analysis, the results can be found in the folder specified in `-o` parameter\n\n* Reads from donor metagenome are split into two categories:\n\n  * `settle_[1|2|s].fastq` — reads which were found in post-FMT recipient metagenome\n\n  * `not_settle_[1|2|s].fastq` — reads which were not found in post-FMT recipient metagenome\n\n* Reads from pre-FMT recipient metagenome are split into two categories:\n\n  * `stay_[1|2|s].fastq` — reads which were found in post-FMT recipient metagenome\n\n  * `gone_[1|2|s].fastq` — reads which were not found in post-FMT recipient metagenome\n\n* Reads from post-FMT recipient metagenome are split into four categories:\n\n  * `came_from_both_[1|2|s].fastq` — reads which were found both in donor and pre-FMT recipient metagenome\n\n  * `came_from_donor_[1|2|s].fastq` — reads which were found only in donor metagenome\n\n  * `came_from_baseline_[1|2|s].fastq` — reads which were found only in pre-FMT recipient metagenome\n\n  * `came_itself_[1|2|s].fastq` — reads which were not found neither in donor metagenome nor in pre-FMT recipient metagenome\n\n### Results visualisation\n\nOne can get the visual representation of a nucleotide sequence coverage by classified reads in post-FMT de Bruijn graphs in tool [Bandage](https://rrwick.github.io/Bandage/). Run `fmt_visualiser.sh` script as in the example below:\n\n~~~\n./fmt_visualiser.sh -k 31 \\\n    -seq \u003cseq.fasta\u003e \\\n    -a \u003cafter_1.fasta after_2.fasta\u003e \\\n    -i \u003cinputDir\u003e \\\n    -w \u003cworkDir\u003e \\\n    -o \u003coutDir\u003e \\\n    -m \u003cmem\u003e \\\n    -p \u003cproc\u003e \\\n    -v\n~~~\n\n* `-k` — the size of k-mer used in de Bruijn graph (must be the **same** as in `reads_classifier.sh`)\n* `-seq` — file with sequences to visualise in FASTA format\n* `-a` — two files with paired post-FMT recipient metagenomic reads. FASTA and FASTQ formats are supported, as well as compressed files *.gz or *.bz2  (must be the **same** as in `reads_classifier.sh`)\n* `-i` — directory containing output of `reads_classifier.sh` FMT classification script\n* `-w` — directory with intermediate working files (optional, default: workDir)\n* `-o` — directory for final visualisation files (optional, default: outDir)\n* `-m` — memory to use (for example: 1500M, 4G, etc.) (optional, default: 2 Gb)\n* `-p` — available processors (optional, default: all)\n* `-v` — enable debug output (optional) \n\nIn the output folder (specified by `-o`) you can find files:\n\n* `*.fasta` — fasta files containing merged nodes for graph of post-FMT recipient around sequences with information about neighbors in description line\n\n* `*.gfa` — files of post-FMT graphs around sequences in [GFA format](https://github.com/GFA-spec/GFA-spec/blob/master/GFA-spec.md) accepted by Bandage as input files. Follow the instructions of Bandage tool to get the colorful visualisation of classification results.\n\n**Post-FMT recipient graphs (`after/*.gfa`)** are colored with five colors:\n\n![](https://via.placeholder.com/15/008000?text=+) green nodes — parts of graph, which classified as `came from both`\n\n![](https://via.placeholder.com/15/ff0000?text=+) red nodes — parts of graph, which classified as `came from donor`\n\n![](https://via.placeholder.com/15/0000ff?text=+) blue nodes — parts of graph, which classified as `came from baseline`\n\n![](https://via.placeholder.com/15/ffff00?text=+) yellow nodes — parts of graph, which classified as `came itself`\n\n![](https://via.placeholder.com/15/999999?text=+) grey nodes — parts of graph, which are covered by multiple categories\n\n\n### Accurate reads classifier\n\nAccurate reads classifier uses soft splitting criteria providing a user with thirteen categories of reads.\nIt also utilizes two values of `k` for building de Bruijn graph, which makes an algorithm to be more accurate.\n\nHere is a bash script showing a typical usage of accurate reads classifier:\n\n~~~\n./triple_reads_classifier.sh -k 31 \\\n    -k2 61 \\\n    -d \u003cdonor_1.fasta donor_2.fasta\u003e \\\n    -b \u003cbefore_1.fasta before_2.fasta\u003e \\\n    -a \u003cafter_1.fasta after_2.fasta\u003e \\\n    -found 90 \\\n    -half 40 \\\n    -w \u003cworkDir\u003e \\\n    -o \u003coutDir\u003e \\\n    -corr \\\n    -m \u003cmem\u003e \\\n    -p \u003cproc\u003e \\\n    -interval95 \\\n    -v \\\n    -dk1 \u003cdonor_k.kmers.bin\u003e \\\n    -dk2 \u003cdonor_k2.kmers.bin\u003e\\\n    -bk1 \u003cbefore_k.kmers.bin\u003e \\\n    -bk2 \u003cbefore_k2.kmers.bin\u003e\\\n    -ak1 \u003cafter_k.kmers.bin\u003e \\\n    -ak2 \u003cafter_k2.kmers.bin\u003e\n~~~\n\n* `-k` — the size of k-mer used in de Bruijn graph\n* `-k2` — the second size of k-mer used in de Bruijn graph. k2 \u003e k\n* `-d` — two files with paired donor metagenomic reads. FASTA and FASTQ formats are supported, as well as compressed files *.gz or *.bz2\n* `-b` — two files with paired pre-FMT recipient metagenomic reads. FASTA and FASTQ formats are supported, as well as compressed files *.gz or *.bz2\n* `-a` — two files with paired post-FMT recipient metagenomic reads. FASTA and FASTQ formats are supported, as well as compressed files *.gz or *.bz2\n* `-found` — Minimum coverage breadth for reads from class found \\[0 - 100 %\\] (optional, default: 90)\n* `-half` — Minimum coverage breadth for reads from class half-found \\[0 - 100 %\\] (optional, default: 40)\n* `-w` — directory with intermediate working files (optional, default: workDir)\n* `-o` — directory for final categories of reads (optional, default: outDir)\n* `-corr` — do replacement of nucleotide in read with one low quality position (optional)\n* `-m` — memory to use (for example: 1500M, 4G, etc.) (optional, default: 2 Gb)\n* `-p` — available processors (optional, default: all)\n* `-interval95` — set the interval width to probability 0.95 (optional)\n* `-v` — enable debug output (optional)\n* `-dk1` — one file with donor k-mers in binary form with k=**k** (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n* `-dk2` — one file with donor k-mers in binary form with k=**k2** (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n* `-bk1` — one file with pre-FMT recipient k-mers in binary form with k=**k** (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n* `-bk2` — one file with pre-FMT recipient k-mers in binary form with k=**k2** (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n* `-ak1` — one file with post-FMT recipient k-mers in binary form with k=**k** (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n* `-ak2` — one file with post-FMT recipient k-mers in binary form with k=**k2** (SEE: [Using k-mers for speed up](#using-k-mers-for-speed-up))\n\n#### Output description\n\nAfter the end of the analysis, the results can be found in the folder specified in `-o` parameter\n\n* Reads from donor metagenome are split into three categories:\n\n  * `settle_[1|2|s].fastq` — reads which were found in post-FMT recipient metagenome\n\n  * `half_settle_[1|2|s].fastq` — reads close to which were found in post-FMT recipient metagenome\n\n  * `not_settle_[1|2|s].fastq` — reads which were not found in post-FMT recipient metagenome\n\n* Reads from pre-FMT recipient metagenome are split into three categories:\n\n  * `stay_[1|2|s].fastq` — reads which were found in post-FMT recipient metagenome\n\n  * `half_stay_[1|2|s].fastq` — reads close to which were found in post-FMT recipient metagenome\n\n  * `gone_[1|2|s].fastq` — reads which were not found in post-FMT recipient metagenome\n\n* Reads from post-FMT recipient metagenome are split into seven categories:\n\n  * `came_from_both_[1|2|s].fastq` — reads which were found both in donor and pre-FMT recipient metagenome\n\n  * `came_from_donor_[1|2|s].fastq` — reads which were found only in donor metagenome\n\n  * `came_from_baseline_[1|2|s].fastq` — reads which were found only in pre-FMT recipient metagenome\n\n  * `came_itself_[1|2|s].fastq` — reads which were not found neither in donor metagenome nor in pre-FMT recipient metagenome\n\n  * `strain_from_donor_[1|2|s].fastq` — reads which were found in donor metagenome and close to which were found in pre-FMT recipient metagenome\n\n  * `strain_from_baseline_[1|2|s].fastq` — reads which were found in pre-FMT recipient metagenome and close to which were found in donor metagenome\n\n  * `strain_itself_[1|2|s].fastq` — reads close to which were found both in donor and pre-FMT recipient metagenome\n\n\n## Using k-mers for speed-up\n\nDe Bruijn graphs of k-mers extracted from input reads are utilised multiple times during execution. Thus, it is highly recommended to extract k-mers from reads at preprocession stage and provide files with extracted k-mers in binary format as input parameters. One should use `kmer-counter` tool from `metacherchant.jar` to perform this action.\n\nHere is a command showing usage of the tool:\n\n~~~\njava -jar metacherchant.jar -t kmer-counter \\\n    -k 31 \\\n    -i \u003creads_1.fasta reads_2.fasta\u003e \\\n    -w \u003cworkDir\u003e \\\n    -o \u003coutDir\u003e \\\n    -m \u003cmem\u003e \\\n    -p \u003cproc\u003e \\\n    -v\n~~~\n\n* `-k` — the size of k-mer to extract from reads\n* `-i` — two files with paired metagenomic reads. FASTA and FASTQ formats are supported, as well as compressed files *.gz or *.bz2 \n* `-w` — directory with intermediate working files (optional, default: workDir)\n* `-o` — directory for final categories of reads (optional, default: outDir)\n* `-m` — memory to use (for example: 1500M, 4G, etc.) (optional, default: 2 Gb)\n* `-p` — available processors (optional, default: all)\n* `-v` — enable debug output (optional)\n\n\n## Citation\n\nIf you use RECAST in your research, please cite the following publication:\n\nOlekhnovich, E. I., Ivanov, A. B., Ulyantsev, V. I., \u0026 Ilina, E. N. (2021). Separation of Donor and Recipient Microbial Diversity Allows Determination of Taxonomic and Functional Features of Gut Microbiota Restructuring following Fecal Transplantation. Msystems, 6(4), e00811-21. [https://doi.org/10.1128/mSystems.00811-21](https://doi.org/10.1128/mSystems.00811-21)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctlab%2Frecast","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fctlab%2Frecast","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctlab%2Frecast/lists"}