{"id":22727081,"url":"https://github.com/microbiomedata/metat","last_synced_at":"2025-10-27T00:25:55.720Z","repository":{"id":42122166,"uuid":"309489996","full_name":"microbiomedata/metaT","owner":"microbiomedata","description":"Metatranscriptomics workflow","archived":false,"fork":false,"pushed_at":"2025-03-25T16:43:37.000Z","size":23257,"stargazers_count":4,"open_issues_count":6,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-25T17:45:10.354Z","etag":null,"topics":["metatranscriptomics","transcriptomics","workflow"],"latest_commit_sha":null,"homepage":"","language":"WDL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microbiomedata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-02T20:42:32.000Z","updated_at":"2025-03-25T16:43:41.000Z","dependencies_parsed_at":"2024-06-06T02:46:00.372Z","dependency_job_id":"43730d0a-473e-4d8a-bab4-9e0805c86a2a","html_url":"https://github.com/microbiomedata/metaT","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microbiomedata","download_url":"https://codeload.github.com/microbiomedata/metaT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250442356,"owners_count":21431312,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["metatranscriptomics","transcriptomics","workflow"],"created_at":"2024-12-10T17:09:13.855Z","updated_at":"2025-10-27T00:25:50.681Z","avatar_url":"https://github.com/microbiomedata.png","language":"WDL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# metaT: The Metatranscriptome Workflow\n\n## Summary\nThis workflow is designed to analyze metatranscriptomes.\n\n![metatranscriptomics workflow](docs/metat_workflow2024.svg)\n\nAll parts of this workflow are housed in their own repositories and imported via WDL v1.0 https importing. \nThe following repositories are used in this workflow:\n - [metaT_ReadsQC](https://github.com/microbiomedata/metaT_ReadsQC)\n - [metaT_Assembly](https://github.com/microbiomedata/metaT_Assembly)\n - [mg_annotation](https://github.com/microbiomedata/mg_annotation)\n - [metaT_ReadCounts](https://github.com/microbiomedata/metaT_ReadCounts)\n\n## Version\n0.0.6\n\n## Third party tools and packages\nTo run this workflow you will need a Docker (Docker ≥ v2.1.0.3) instance and cromwell. All the third party tools are pulled from Dockerhub.\n\n```\nbbtools ≥ v38.94\nPython ≥ v3.7.12\npandas ≥ v1.0.5 (python package)\ngffutils ≥ v0.10.1 (python package)\n```\n\n## Databases\nmetaT uses the same database uses for metagenome annotation. See README [here](https://github.com/microbiomedata/mg_annotation) for required databases. For QC databases see [here](https://github.com/microbiomedata/ReadsQC.)\n\n\n## Running workflow\n\n###  In a server with shifter\nThe submit script will request a node and launch the Cromwell.  The Cromwell manages the workflow by using Shifter to run applications.\n\n\n```\njava -Dconfig.file=wdls/shifter.conf -jar /full/path/to/cromwell-XX.jar run -i input.json /full/path/to/wdls/metaT.wdl\n\n```\n\n\n## Docker images\n\n- [microbiomedata/meta_t:0.0.5](https://hub.docker.com/r/microbiomedata/meta_t)\n- [bryce911/bbtools:38.86](https://hub.docker.com/r/microbiomedata/bbtools)\n\n\n## Inputs\n\n```json\n{\n    \"metaT.input_files\": [\"./test_data/small_test/test_small_interleave.fastq.gz\"],\n    \"metaT.project_id\":\"nmdc:xxxxxxx\",\n    \"metaT.strand_type\": \"aRNA\"\n}\n```\n### Input option descriptions:\n- `project_id`: A unique name for your project or sample.\n- `input_file`: Full path to the fastq file. The file must be intereleaved paired end fastq.\n- `input_fq1` and `input_fq2` if non-interleaved paired end fastqs\n- `strand_type`: (optional) RNA strandedness, either left blank, `aRNA`, or `non_stranded_RNA`\n\n## Outputs\nAll outputs can be found in the `outdir` folder. There are following subfolders:\n- `outdir/annotation`: contains gff files from annotation run.\n- `outdir/assembly`: contains FASTA files from assembly and BAM files where reads were mapped back to the contigs.\n- `outdir/readMapping`: JSON files for sense and antisense that have records for feature, their annotations, read counts, ans associated statistics. \n- `outdir/readsQC`: contains cleaned reads and a file with associated statistics.\n\n# Output JSON\nThe output file is a JSON formatted file called `out.json` with JSON records that contains reads and information from annotation. An example JSON record:\n```json\n        {\n        \"featuretype\": \"CDS\",\n        \"seqid\": \"nmdc:xxxxxxx_001\",\n        \"id\": \"nmdc:xxxxxxx_001_1_588\",\n        \"source\": \"Prodigal v2.6.3_patched\",\n        \"start\": 1,\n        \"end\": 588,\n        \"length\": 588,\n        \"strand\": \"+\",\n        \"frame\": \"0\",\n        \"product\": \"hypothetical protein\",\n        \"product_source\": \"Hypo-rule applied\",\n        \"sense_read_count\": 25,\n        \"mean\": 5.0,\n        \"median\": 3.0,\n        \"stdev\": 6.1,\n        \"antisense_read_count\": 28,\n        \"meanA\": 7.14,\n        \"medianA\": 7,\n        \"stdevA\": 5.7\n    }\n\n```\n\n## Test \nTo test the workflow, we have provided a small test dataset and a step by step guidance. See `test_data` folder.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrobiomedata%2Fmetat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrobiomedata%2Fmetat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrobiomedata%2Fmetat/lists"}