{"id":34107075,"url":"https://github.com/maxibor/corecomb","last_synced_at":"2026-04-06T01:31:44.314Z","repository":{"id":220499741,"uuid":"750871115","full_name":"maxibor/corecomb","owner":"maxibor","description":"Toolkit to deal with core-genome alignments recombination detection","archived":false,"fork":false,"pushed_at":"2025-01-07T09:56:12.000Z","size":2365,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-02-28T20:29:12.761Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxibor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-31T13:44:43.000Z","updated_at":"2025-01-07T09:56:16.000Z","dependencies_parsed_at":"2024-02-02T12:25:39.190Z","dependency_job_id":"ff4343aa-e7d0-4e30-9516-a1513f7ece26","html_url":"https://github.com/maxibor/corecomb","commit_stats":null,"previous_names":["maxibor/corecomb"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/maxibor/corecomb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fcorecomb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fcorecomb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fcorecomb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fcorecomb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxibor","download_url":"https://codeload.github.com/maxibor/corecomb/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fcorecomb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31456608,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"ssl_error","status_checked_at":"2026-04-05T21:22:51.943Z","response_time":75,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-14T18:04:37.908Z","updated_at":"2026-04-06T01:31:44.308Z","avatar_url":"https://github.com/maxibor.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/maxibor/corecomb/master/img/logo_text_small.png\" width=\"300\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/maxibor/corecomb/actions/workflows/ci.yaml\"\u003e\n        \u003cimg src=\"https://github.com/maxibor/corecomb/actions/workflows/ci.yaml/badge.svg\" alt=\"Workflow status badge\"\u003e\n        \u003ca href=\"https://pypi.org/project/corecomb\"\u003e\u003cimg src=\"https://badge.fury.io/py/corecomb.svg\" alt=\"PyPI version\" height=\"18\"\u003e\u003c/a\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n**Corecomb**: create a XMFA file from Panaroo core gene alignments to detect recombination in core-genome using ClonalFrameML.\n\n## Installation\n\n```bash\npip install corecomb\n```\n\n## Quick start\n\nIf you are in Panaroo output directory, just run: \n\n```\ncorecomb \n```\n\n## Get help\n\n```bash\n$ corecomb --help\n\n Usage: corecomb [OPTIONS]\n\n Create XMFA file from ClonalFrameML input from Panaroo core-genome gene alignments\n\n╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ --gene_al_dir    TEXT  Path to directory containing core-genome gene alignments [default: core_gene_alignments]                 │\n│ --pan_fa         TEXT  Path to Panaroo pan_genome_reference.fa [default: pan_genome_reference.fa]                               │\n│ --extension      TEXT  File extension of core-genome gene alignments [default: fas]                                             │\n│ --outfile        TEXT  Path to output XMFA file [default: corecomb.xmfa]                                                        │\n│ --help                 Show this message and exit.                                                                              │\n╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n```\n\n## Why\n\nIn theory, using the indivudal core-gene multiple sequence alignments from the `core_gene_alignments` directory of Panaroo, one could just run a `sed` command to concatenate these in a [XMFA file](https://darlinglab.org/mauve/user-guide/files.html).\n\n```bash\nsed -e '$s/$/\\n=/' -s ../tests/data/aligned_gene_sequences_raw/*.fas \u003e core_gene_alignment.xmfa\n```\n\nHowever, this approach suffers from 3 different issues:\n\n- Sequence names need to be cleaned\n- Ambiguous non `N` IUPAC characters need to be taken care of (CFML only accepts `A,T,G,C,N,-`)\n- Genomes with missing genes will cause CFML to crash (core-genome defined at less 100%)\n\n\u003e CoRecomb addresses all 3 of these issues. Additionally, CoRecomb uses the order of the genes [defined in the `pan_genome_reference.fa`](https://github.com/gtonkinhill/panaroo/issues/146) to re-order the genes in the XMFA file (which will be kept by CFML output `core_gene_test_cfml.filtered.fasta`).\n\n## Test it for yourself\n\n```bash\npoetry run pytest -vv\n```\n\nTest data can be found here [tests/data](tests/data)\n\n```bash\ncorecomb \\\n    --gene_al_dir tests/data/aligned_gene_sequences_raw \\\n    --pan_fa tests/data/pan_genome_reference.fa \\\n    --extension fas \\\n    --outfile corecomb.xmfa\n```\n\n## Use the XMFA with ClonalFrameML\n\n```bash\nClonalFrameML \\\n    input_tree.nwk \\\n    corecomb.xmfa \\\n    cfml_output_basename \\\n    -xmfa_file true \\\n    -show_progress true \\\n    -output_filtered true\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxibor%2Fcorecomb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxibor%2Fcorecomb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxibor%2Fcorecomb/lists"}