{"id":22727103,"url":"https://github.com/microbiomedata/metaassembly","last_synced_at":"2025-10-09T09:15:34.617Z","repository":{"id":44408206,"uuid":"265963967","full_name":"microbiomedata/metaAssembly","owner":"microbiomedata","description":"Workflow for metagenome assembly","archived":false,"fork":false,"pushed_at":"2025-07-30T19:49:40.000Z","size":11663,"stargazers_count":4,"open_issues_count":5,"forks_count":4,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-07-30T22:19:32.718Z","etag":null,"topics":["assembly","metagenome-assembly","wdl","wdl-workflow","workflow"],"latest_commit_sha":null,"homepage":"","language":"WDL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microbiomedata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-05-21T22:24:45.000Z","updated_at":"2025-06-24T15:55:02.000Z","dependencies_parsed_at":"2023-02-15T13:31:50.423Z","dependency_job_id":"b1df32cf-93bd-4312-a14c-af3e4f5c54cf","html_url":"https://github.com/microbiomedata/metaAssembly","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/microbiomedata/metaAssembly","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaAssembly","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaAssembly/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaAssembly/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaAssembly/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microbiomedata","download_url":"https://codeload.github.com/microbiomedata/metaAssembly/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microbiomedata%2FmetaAssembly/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001114,"owners_count":26083021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assembly","metagenome-assembly","wdl","wdl-workflow","workflow"],"created_at":"2024-12-10T17:09:41.389Z","updated_at":"2025-10-09T09:15:34.586Z","avatar_url":"https://github.com/microbiomedata.png","language":"WDL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# The Metagenome Assembly Pipeline\n\n## Summary\nThis workflow is developed by Brian Foster at JGI and original from his [repo](https://gitlab.com/bfoster1/wf_templates/tree/master/templates). It takes in paired-end Illumina short reads or PacBio long reads. \n\nIn short reads, the workflow reformats the interleaved file into two FASTQ files for downstream tasks using bbcms (BBTools). The corrected reads are assembled using metaSPAdes. After assembly, the reads are mapped back to contigs by bbmap (BBTools) for coverage information. The `.wdl` (Workflow Description Language) file includes five tasks: *bbcms*, *assy*, *create_agp*, *read_mapping_pairs*, and *make_output*.\n\nIn long reads, the workflow uses Flye for assembly, pbmm2 for alignment, Racon for polishing, and minimap2 for read mapping and coverage analysis. The `.wdl` (Workflow Description Language) file includes six tasks: *combine_fastq*, *assy*, *racon*, *format_assembly*, *map*, and *make_info_file*.\n\n\n## The Docker image and Dockerfile can be found here\n\n[microbiomedata/bbtools:39.03](https://hub.docker.com/r/microbiomedata/bbtools)\n\n[microbiomedata/spades:4.0.0](https://hub.docker.com/r/microbiomedata/spades)\n\n\n## Input files\n\n1. The path to the input FASTQ file (Illumina paired-end interleaved FASTQ or PacBio paired-end interleaved FASTQ) (recommended: output of the Reads QC workflow).\n    \n2. Project name, e.g. `nmdc:XXXXXX`\n    \n3. Memory (optional) e.g., `\"jgi_metaAssembly.memory\": \"105G\"`\n\n4. Threads (optional) e.g., `\"jgi_metaAssembly.threads\": \"16\"`\n\n5. Whether the input is short reads (boolean) \n\n\n```\n{\n        \"jgi_metaAssembly.input_files\": [\"https://portal.nersc.gov/project/m3408/test_data/smalltest.int.fastq.gz\"],\n        \"jgi_metaAssembly.proj\": \"nmdc:XXXXXX\",\n        \"jgi_metaAssembly.memory\": \"105G\",\n        \"jgi_metaAssembly.threads\": \"16\",\n        \"jgi_metaAssembly.shortRead\": true\n}\n```\n\n## Output files\n\nBelow is a part list of all output files. The main assembly contigs output is in final_assembly/assembly.contigs.fasta.\n\n```\n# Short Reads\n    output/\n    ├── nmdc_XXXXXX_metaAsm.info\n    ├── nmdc_XXXXXX_covstats.txt\n    ├── nmdc_XXXXXX_contigs.fna\n    ├── nmdc_XXXXXX_bbcms.fastq.gz\n    ├── nmdc_XXXXXX_scaffolds.fna\n    ├── nmdc_XXXXXX_assembly.agp\n    ├── stats.json\n    ├── nmdc_XXXXXX_pairedMapped.sam.gz\n    └── nmdc_XXXXXX_pairedMapped_sorted.bam\n# Long Reads\n    output/\n    ├── nmdc_XXXXXX_assembly.legend\n    ├── nmdc_XXXXXX_contigs.fna\n    ├── nmdc_XXXXXX_pairedMapped_sorted.bam\n    ├── nmdc_XXXXXX_read_count_report.txt\n    ├── nmdc_XXXXXX_metaAsm.info\n    ├── nmdc_XXXXXX_summary.stats\n    ├── nmdc_XXXXXX_scaffolds.fna\n    ├── nmdc_XXXXXX_pairedMapped.sam.gz\n    ├── stats.json\n    ├── nmdc_XXXXXX_contigs.sam.stats\n    ├── nmdc_XXXXXX_contigs.sorted.bam.pileup.basecov\n    ├── nmdc_XXXXXX_assembly.agp\n    └── nmdc_XXXXXX_contigs.sorted.bam.pileup.out\n```\n## Link to Doc Site\nPlease refer [here](https://docs.microbiomedata.org/workflows/chapters/4_Metagenome_Assembly/) for more information.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrobiomedata%2Fmetaassembly","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrobiomedata%2Fmetaassembly","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrobiomedata%2Fmetaassembly/lists"}