{"id":13958720,"url":"https://github.com/mcveanlab/mccortex","last_synced_at":"2025-12-17T11:54:37.878Z","repository":{"id":16795076,"uuid":"19553701","full_name":"mcveanlab/mccortex","owner":"mcveanlab","description":"De novo genome assembly and multisample variant calling","archived":false,"fork":false,"pushed_at":"2019-03-28T11:36:02.000Z","size":10339,"stargazers_count":112,"open_issues_count":28,"forks_count":25,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-07-21T00:42:22.152Z","etag":null,"topics":["contigs","cortex","de-bruijn-graphs","genome-analysis","genome-assembly","genome-graph","genomics","kmer","variant-calling"],"latest_commit_sha":null,"homepage":"https://github.com/mcveanlab/mccortex/wiki","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mcveanlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-05-07T23:23:02.000Z","updated_at":"2025-02-22T18:08:54.000Z","dependencies_parsed_at":"2022-08-25T15:02:49.134Z","dependency_job_id":null,"html_url":"https://github.com/mcveanlab/mccortex","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/mcveanlab/mccortex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcveanlab%2Fmccortex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcveanlab%2Fmccortex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcveanlab%2Fmccortex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcveanlab%2Fmccortex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mcveanlab","download_url":"https://codeload.github.com/mcveanlab/mccortex/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcveanlab%2Fmccortex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27782842,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-17T02:00:08.291Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["contigs","cortex","de-bruijn-graphs","genome-analysis","genome-assembly","genome-graph","genomics","kmer","variant-calling"],"created_at":"2024-08-08T13:01:49.287Z","updated_at":"2025-12-17T11:54:37.873Z","avatar_url":"https://github.com/mcveanlab.png","language":"C","funding_links":[],"categories":["其他_生物医药"],"sub_categories":["网络服务_其他"],"readme":"McCortex: Population De Novo Assembly and Variant Calling\n===============================================\n\n* _Integrating long-range connectivity information into de Bruijn graphs_\n  Turner I, Garimella K, Iqbal Z, McVean G (*Bioinformatics*; Advanced access 15 March 2018)\n  https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty157/4938484\n\n\nMulti-sample de novo assembly and variant calling using Linked de bruijn graphs.\nVariant calling with and without a reference genome. Between closely related\nsamples or highly diverged ones. From bacterial to mammalian genomes. Minimal\nconfiguration. And it's free.\n\nIsaac Turner's rewrite of *cortex_var*, to handle larger populations\nwith better genome assembly, as a set of modular commands. PhD supervisor: Prof Gil McVean. Collaborators: Zam Iqbal, Kiran Garimella. Based at the Wellcome Trust Centre for Human Genetics, University of Oxford.\n\n27 May 2018\n\nBranch         | Status\n---------------|--------\nmaster:        | [![Build Status](https://travis-ci.org/mcveanlab/mccortex.svg?branch=master)](https://travis-ci.org/mcveanlab/mccortex)\ndevelop:       | [![Build Status](https://travis-ci.org/mcveanlab/mccortex.svg?branch=develop)](https://travis-ci.org/mcveanlab/mccortex)\ncode analysis: | [![Coverity Scan Build Status](https://scan.coverity.com/projects/2329/badge.svg)](https://scan.coverity.com/projects/2329)\n\nBuild\n-----\n\nMcCortex compiles with clang and gcc. Tested on Mac OS X and linux. Requires zlib.\nDownload with:\n\n    git clone --recursive https://github.com/mcveanlab/mccortex\n\nInstall dependencies (for htslib) on mac:\n\n    brew update\n    brew install xz\n\nOr on linux:\n\n    sudo apt install liblzma-dev libbz2-dev\n    sudo apt install r-base-core  # if you want to plot with R\n\nTo compile for a maximum kmer size of 31:\n\n    make all\n\nto compile for a maximum kmer size of 63:\n\n    make MAXK=63 all\n\nExecutables appear in the `bin/` directory.\n\n\nQuickstart: Variant calling\n---------------------------\n\nDownload and compile McCortex. Can be in any directory, later I'll assume it's in `~/mccortex/`:\n\n    git clone --recursive https://github.com/mcveanlab/mccortex\n    cd mccortex\n    make all MAXK=31\n    make all MAXK=63\n\nNow write a file detailing your samples and their data. Columns are separated by one or more spaces/tabs. File entries are separated by commas. Paired-end read files are separated by a colon ':'. File paths can be relative to the current directory or absolute. Most fileformats are supported:\n\n    cd /path/to/your/data\n    echo \"#sample_name  SE_files   PE_files                     interleaved_files\" \u003e  samples.txt\n    echo \"Mickey        a.fa,b.fa  reads.1.fq.gz:reads.2.fq.gz  .\"                 \u003e\u003e samples.txt\n    echo \"Minney        .          reads.1.fq.gz:reads.2.fq.gz  in.bam\"            \u003e\u003e samples.txt\n    echo \"Pluto         seq.fq     .                            pluto.cram\"        \u003e\u003e samples.txt\n\nCreate a job file from your sample file (`samples.txt`). All output will go into the directory we specify (`mc_calls`). We also specify the kmer(s) to use. We'll run at `k=31` and `k=61` and merge the results.\n\nIf your data are haploid, we set `--ploidy 1`:\n\n    ~/mccortex/scripts/make-pipeline.pl -r /path/to/ref.fa --ploidy 1 31,61 mc_calls samples.txt \u003e job.k31.k61.mk\n\nIf your samples are human, you have a mix of haploid and diploid chromosomes. Therefore you need to specify which samples have only one copy of `chrX` and one of `chrY`. The format is `-P \u003csample\u003e:\u003cchr\u003e:\u003cploidy\u003e` where `\u003csample\u003e` and `\u003cchr\u003e` can be comma-separated lists. Ploidy arguments are read in order.\n\n    ~/mccortex/scripts/make-pipeline.pl -r /path/to/ref.fa --ploidy \"-P .:.:2 -P .:chrY:1 -P Mickey:chrX:1\" 31,61 mc_calls samples.txt \u003e job.k31.k61.mk\n\nNow you're ready to run. You'll need to pass:\n- path to McCortex `CTXDIR=`\n- how much memory to use `MEM=`  (2GB for ten E. coli, 70GB for a human)\n- number of threads to use `NTHREADS=`\n\nRun the job file:\n\n    make -f job.k31.k61.mk CTXDIR=~/mccortex MEM=70GB NTHREADS=8 \\\n                           JOINT_CALLING=yes USE_LINKS=no brk-geno-vcf\n\nFor a human genome, running time will be about 8 hours for a single sample and use about 70GB RAM. For small numbers of similar samples, peak memory usage will remain the same as a single sample, and should increase roughly logarithmically with the number of samples.\n\nJob finished? Your results are in: `mc_calls/vcfs/breakpoints.joint.plain.k31.k61.geno.vcf.gz`.\n\nSomething go wrong? Take a look at the log file of the last command that ran. You may need to increase memory or compile for a different `MAXK=` value. Once you've fixed the issue, just rerun the `make -f job...` command. Add `--dry-run` to the `make` command to see which commands are going to be run without running them. \n\n*De novo genotyping:* once de Bruijn graphs have been constructed, they can be used to genotype existing call sets (VCF+ref) without using mapped reads. See [the wiki](https://github.com/mcveanlab/mccortex/wiki/VCF-Genotyping).\n\nCommands\n--------\n\n    usage: mccortex31 \u003ccommand\u003e [options] \u003cargs\u003e\n    version: ctx=XXXX zlib=1.2.5 htslib=1.2.1 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31\n    \n    Commands:   breakpoints  use a trusted assembled genome to call large events\n                bubbles      find bubbles in graph which are potential variants\n                build        construct cortex graph from FASTA/FASTQ/BAM\n                calls2vcf    convert bubble/breakpoint calls to VCF\n                check        load and check graph (.ctx) and path (.ctp) files\n                clean        clean errors from a graph\n                contigs      assemble contigs for a sample\n                correct      error correct reads\n                coverage     print contig coverage\n                dist         make colour kmer distance matrix\n                index        index a sorted cortex graph file\n                inferedges   infer graph edges between kmers before calling `thread`\n                join         combine graphs, filter graph intersections\n                links        clean and plot link files (.ctp)\n                pjoin        merge link files (.ctp)\n                popbubbles   pop bubbles in the population graph\n                pview        text view of a cortex link file (.ctp)\n                reads        filter reads against a graph\n                rmsubstr     reduce set of strings to remove substrings\n                server       interactively query the graph\n                sort         sort the kmers in a graph file\n                subgraph     filter a subgraph using seed kmers\n                thread       thread reads through cleaned graph to make links\n                uniqkmers    generate random unique kmers\n                unitigs      pull out unitigs in FASTA, DOT or GFA format\n                vcfcov       coverage of a VCF against cortex graphs\n                vcfgeno      genotype a VCF after running vcfcov\n                view         text view of a cortex graph file (.ctx)\n\n\n      Type a command with no arguments to see help.\n    \n    Common Options:\n      -h, --help            Help message\n      -q, --quiet           Silence status output normally printed to STDERR\n      -f, --force           Overwrite output files if they already exist\n      -m, --memory \u003cM\u003e      Memory e.g. 1GB [default: 1GB]\n      -n, --nkmers \u003cH\u003e      Hash entries [default: 4M, ~4 million]\n      -t, --threads \u003cT\u003e     Limit on proccessing threads [default: 2]\n      -o, --out \u003cfile\u003e      Output file\n      -p, --paths \u003cin.ctp\u003e  Assembly file to load (can specify multiple times)\n\nGetting Helps\n-------------\n\nType a command with no arguments to see usage. The following may also be useful:\n* [wiki](https://github.com/mcveanlab/mccortex/wiki)\n* [website](http://mcveanlab.github.io/mccortex)\n* [mailing list](https://groups.google.com/forum/#!forum/cortex_var)\n* Report a [bug / feature request](https://github.com/mcveanlab/mccortex/issues) on GitHub\n* Email me: Isaac Turner \u003cturner.isaac@gmail.com\u003e\n\n\nCode And Contributing\n---------------------\n\nIssues can be submitted on github. Pull requests welcome. Please add your name\nto the AUTHORS file. Code should compile on mac/linux with clang/gcc without errors or warnings.\n\nMore on the [wiki](https://github.com/mcveanlab/mccortex/wiki/Contributing)\n\nUnit tests are run with `make test` and integration tests with `cd tests; ./run`. Both of these test suites are run automatically with Travis CI when commits are pushed to GitHub. \n\nStatic analysis can be run with [cppcheck](http://cppcheck.sourceforge.net):\n\n    cppcheck src\n\nor with [clang](http://clang-analyzer.llvm.org):\n\n    rm -rf bin/mccortex31\n    scan-build make RECOMPILE=1\n\nOccasionally we also run Coverity Scan. This is done by pushing to the `coverity_scan` branch on github, which triggers Travis CI to upload the latest code to Coverity.\n\n[![Coverity Scan Build Status](https://scan.coverity.com/projects/2329/badge.svg)](https://scan.coverity.com/projects/2329)\n\n    git checkout coverity_scan\n    git merge develop\n    git checkout --ours .travis.yml\n\nLicense: MIT\n------------\n\nBundled libraries may have different licenses:\n* [BitArray](https://github.com/noporpoise/BitArray) (Public Domain)\n* [cJSON](http://http://sourceforge.net/projects/cjson/) (MIT)\n* [CityHash](https://code.google.com/p/cityhash/) (MIT)\n* [htslib](https://github.com/samtools/htslib) (MIT)\n* [lookup3](http://burtleburtle.net/bob/c/lookup3.c) (Public Domain)\n* [madcrowlib](https://github.com/noporpoise/madcrowlib) (MIT)\n* [msg-pool](https://github.com/noporpoise/msg-pool) (Public Domain)\n* [seq-align](https://github.com/noporpoise/seq-align) (Public Domain)\n* [seq_file](https://github.com/noporpoise/seq_file) (Public Domain)\n* [sort_r](https://github.com/noporpoise/sort_r) (Public Domain)\n* [carrays](https://github.com/noporpoise/carrays) (Public Domain)\n* [string_buffer](https://github.com/noporpoise/string_buffer) (Public Domain)\n* [xxHash](https://github.com/Cyan4973/xxHash.git) (BSD)\n\nUsed in testing:\n* [bcftools](https://github.com/samtools/bcftools) (MIT)\n* [bioinf-perl](https://github.com/noporpoise/bioinf-perl) (Public Domain)\n* [bwa](https://github.com/lh3/bwa) (MIT)\n* [readsim](https://github.com/noporpoise/readsim) (Public Domain)\n* [samtools](https://github.com/samtools/samtools) (MIT)\n* [bfc](https://github.com/lh3/bfc) (MIT)\n\nCiting\n------\n\nIf you find McCortex useful, please cite our paper:\n\n* Integrating long-range connectivity information into de Bruijn graphs\n  Turner I, Garimella K, Iqbal Z, McVean G (Bioinformatics) (Advanced access 15 March 2018) https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty157/4938484\n\nOther Cortex papers:\n\n* De novo assembly and genotyping of variants using colored de Bruijn graphs,\n  Iqbal(*), Caccamo(*), Turner, Flicek, McVean (Nature Genetics) (2012)\n  (doi:10.1038/ng.1028) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3272472\n* High-throughput microbial population genomics using the Cortex variation assembler,\n  Iqbal, Turner, McVean (Bioinformatics) (Nov 2012)\n  (doi:10.1093/bioinformatics/bts673) http://www.ncbi.nlm.nih.gov/pubmed/23172865\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmcveanlab%2Fmccortex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmcveanlab%2Fmccortex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmcveanlab%2Fmccortex/lists"}