{"id":15060115,"url":"https://github.com/tanghaibao/jcvi","last_synced_at":"2025-05-14T06:12:38.565Z","repository":{"id":1214075,"uuid":"1130393","full_name":"tanghaibao/jcvi","owner":"tanghaibao","description":"Python library to facilitate genome assembly, annotation, and comparative genomics","archived":false,"fork":false,"pushed_at":"2025-05-09T06:40:51.000Z","size":19758,"stargazers_count":817,"open_issues_count":54,"forks_count":190,"subscribers_count":36,"default_branch":"main","last_synced_at":"2025-05-09T07:41:49.665Z","etag":null,"topics":["allmaps","assembly","bioinformatics","blast","comparative-genomics","genetic-maps","genome-sequencing","genomics","sequence-alignments","synteny","variant-calling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"OttoRobotto/meteor-three","license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tanghaibao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2010-12-01T23:18:02.000Z","updated_at":"2025-05-06T21:37:25.000Z","dependencies_parsed_at":"2023-07-05T18:32:32.733Z","dependency_job_id":"b6920d9b-a7b6-4261-bc38-3a3330aa4043","html_url":"https://github.com/tanghaibao/jcvi","commit_stats":{"total_commits":2897,"total_committers":28,"mean_commits":"103.46428571428571","dds":"0.21366931308249915","last_synced_commit":"cd4da7431c2d884be395b74e698aec10bd02aba7"},"previous_names":[],"tags_count":101,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanghaibao%2Fjcvi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanghaibao%2Fjcvi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanghaibao%2Fjcvi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanghaibao%2Fjcvi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tanghaibao","download_url":"https://codeload.github.com/tanghaibao/jcvi/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254080562,"owners_count":22011443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["allmaps","assembly","bioinformatics","blast","comparative-genomics","genetic-maps","genome-sequencing","genomics","sequence-alignments","synteny","variant-calling"],"created_at":"2024-09-24T22:53:10.355Z","updated_at":"2025-05-14T06:12:38.555Z","avatar_url":"https://github.com/tanghaibao.png","language":"Python","funding_links":[],"categories":["基因"],"sub_categories":["资源传输下载"],"readme":"# JCVI: A Versatile Toolkit for Comparative Genomics Analysis\n\n[![Latest PyPI version](https://img.shields.io/pypi/v/jcvi.svg)](https://pypi.python.org/pypi/jcvi)\n[![bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/jcvi/README.html?highlight=jcvi)\n[![Github Actions](https://github.com/tanghaibao/jcvi/workflows/build/badge.svg)](https://github.com/tanghaibao/jcvi/actions)\n[![Downloads](https://pepy.tech/badge/jcvi)](https://pepy.tech/project/jcvi)\n\nCollection of Python libraries to parse bioinformatics files, or perform\ncomputation related to assembly, annotation, and comparative genomics.\n\n|         |                                                                  |\n| ------- | ---------------------------------------------------------------- |\n| Authors | Haibao Tang ([tanghaibao](http://github.com/tanghaibao))         |\n|         | Vivek Krishnakumar ([vivekkrish](https://github.com/vivekkrish)) |\n|         | Adam Taranto ([Adamtaranto](https://github.com/Adamtaranto))     |\n|         | Xingtan Zhang ([tangerzhang](https://github.com/tangerzhang))    |\n|         | Won Cheol Yim ([wyim-pgl](https://github.com/wyim-pgl))          |\n| Email   | \u003ctanghaibao@gmail.com\u003e                                           |\n| License | [BSD](http://creativecommons.org/licenses/BSD/)                  |\n\n## How to cite\n\n\u003e [!TIP]\n\u003e JCVI is now published in iMeta!\n\u003e\n\u003e _Tang et al. (2024) JCVI: A Versatile Toolkit for Comparative Genomics\n\u003e Analysis. [iMeta](https://doi.org/10.1002/imt2.211)_\n\n![MCSCAN example](https://www.dropbox.com/s/9vl3ys3ndvimg4c/grape-peach-cacao.png?raw=1)\n\n![ALLMAPS animation](https://www.dropbox.com/s/jfs8xavcxix37se/ALLMAPS.gif?raw=1)\n\n![GRABSEEDS example](https://www.dropbox.com/s/yu9ehsi6sqifuaa/bluredges.png?raw=1)\n\n## Contents\n\nFollowing modules are available as generic Bioinformatics handling\nmethods.\n\n- \u003ckbd\u003ealgorithms\u003c/kbd\u003e\n\n  - Linear programming solver with SCIP and GLPK.\n  - Supermap: find set of non-overlapping anchors in BLAST or NUCMER output.\n  - Longest or heaviest increasing subsequence.\n  - Matrix operations.\n\n- \u003ckbd\u003eapps\u003c/kbd\u003e\n\n  - GenBank entrez accession, Phytozome, Ensembl and SRA downloader.\n  - Calculate (non)synonymous substitution rate between gene pairs.\n  - Basic phylogenetic tree construction using PHYLIP, PhyML, or RAxML, and viualization.\n  - Wrapper for BLAST+, LASTZ, LAST, BWA, BOWTIE2, CLC, CDHIT, CAP3, etc.\n\n- \u003ckbd\u003eformats\u003c/kbd\u003e\n\n  Currently supports `.ace` format (phrap, cap3, etc.), `.agp`\n  (goldenpath), `.bed` format, `.blast` output, `.btab` format,\n  `.coords` format (`nucmer` output), `.fasta` format, `.fastq`\n  format, `.fpc` format, `.gff` format, `obo` format (ontology),\n  `.psl` format (UCSC blat, GMAP, etc.), `.posmap` format (Celera\n  assembler output), `.sam` format (read mapping), `.contig`\n  format (TIGR assembly format), etc.\n\n- \u003ckbd\u003egraphics\u003c/kbd\u003e\n\n  - BLAST or synteny dot plot.\n  - Histogram using R and ASCII art.\n  - Paint regions on set of chromosomes.\n  - Macro-synteny and micro-synteny plots.\n  - Ribbon plots from whole genome alignments.\n\n- \u003ckbd\u003eutils\u003c/kbd\u003e\n  - Grouper can be used as disjoint set data structure.\n  - range contains common range operations, like overlap\n    and chaining.\n  - Miscellaneous cookbook recipes, iterators decorators,\n    table utilities.\n\nThen there are modules that contain domain-specific methods.\n\n- \u003ckbd\u003eassembly\u003c/kbd\u003e\n\n  - K-mer histogram analysis.\n  - Preparation and validation of tiling path for clone-based assemblies.\n  - Scaffolding through ALLMAPS, optical map and genetic map.\n  - Pre-assembly and post-assembly QC procedures.\n\n- \u003ckbd\u003eannotation\u003c/kbd\u003e\n\n  - Training of _ab initio_ gene predictors.\n  - Calculate gene, exon and intron statistics.\n  - Wrapper for PASA and EVM.\n  - Launch multiple MAKER processes.\n\n- \u003ckbd\u003ecompara\u003c/kbd\u003e\n  - C-score based BLAST filter.\n  - Synteny scan (de-novo) and lift over (find nearby anchors).\n  - Ancestral genome reconstruction using Sankoff's and PAR method.\n  - Ortholog and tandem gene duplicates finder.\n\n## Applications\n\nPlease visit [wiki](https://github.com/tanghaibao/jcvi/wiki) for\nfull-fledged applications.\n\n## Dependencies\n\nJCVI requires Python3 between v3.9 and v3.12.\n\nSome graphics modules require the [ImageMagick](https://imagemagick.org/index.php) library.\n\nOn MacOS this can be installed using Conda (see next section). If you are using a linux system (i.e. Ubuntu) you can install ImageMagick using apt-get:\n\n```bash\nsudo apt-get update\nsudo apt-get install libmagickwand-dev\n```\n\nSee the [Wand](https://docs.wand-py.org/en/0.2.4/guide/install.html) docs for instructions on installing ImageMagick on other systems.\n\nA few modules may ask for locations of external programs,\nif the executable cannot be found in your `PATH`.\n\nThe external programs that are often used are:\n\n- [Kent tools](http://hgdownload.cse.ucsc.edu/admin/jksrc.zip)\n- [BEDTOOLS](http://code.google.com/p/bedtools/)\n- [EMBOSS](http://emboss.sourceforge.net/)\n\n### Managing dependencies with Conda\n\nYou can use the the YAML files in this repo to create an environment with basic JCVI dependencies.\n\nIf you are new to Conda, we recommend the [Miniforge](https://conda-forge.org/download/) distribution.\n\n```bash\nconda env create -f environment.yml\n\nconda activate jcvi\n```\n\nNote: If you are using a Mac with an ARM64 (Apple Silicon) processor, some dependencies are not currently available from Bioconda for this architecture.\n\nYou can instead create a virtual OSX64 (intel) env like this:\n\n```bash\nconda env create -f env_osx64.yml\n\nconda activate jcvi-osx64\n```\n\nAfter activating the Conda environment install JCVI using one of the following options.\n\n## Installation\n\n### Installation options\n\n1) Use pip to install the latest development version directly from this repo.\n\n```bash\npip install git+git://github.com/tanghaibao/jcvi.git\n```\n\n2) Install latest release from PyPi.\n\n```bash\npip install jcvi\n```\n\n3) Alternatively, if you want to install in development mode.\n\n```bash\ngit clone git://github.com/tanghaibao/jcvi.git \u0026\u0026 cd jcvi\npip install -e '.[tests]'\n```\n\n### Test Installation\n\nIf installed successfully, you can check the version with:\n\n```bash\njcvi --version\n```\n\n## Usage\n\nUse `python -m` to call any of the modules installed with JCVI.\n\nMost of the modules in this package contains multiple actions. To use\nthe `fasta` example:\n\n```console\nUsage:\n    python -m jcvi.formats.fasta ACTION\n\n\nAvailable ACTIONs:\n          clean | Remove irregular chars in FASTA seqs\n           diff | Check if two fasta records contain same information\n        extract | Given fasta file and seq id, retrieve the sequence in fasta format\n          fastq | Combine fasta and qual to create fastq file\n         filter | Filter the records by size\n         format | Trim accession id to the first space or switch id based on 2-column mapping file\n        fromtab | Convert 2-column sequence file to FASTA format\n           gaps | Print out a list of gap sizes within sequences\n             gc | Plot G+C content distribution\n      identical | Given 2 fasta files, find all exactly identical records\n            ids | Generate a list of headers\n           info | Run `sequence_info` on fasta files\n          ispcr | Reformat paired primers into isPcr query format\n           join | Concatenate a list of seqs and add gaps in between\n     longestorf | Find longest orf for CDS fasta\n           pair | Sort paired reads to .pairs, rest to .fragments\n    pairinplace | Starting from fragment.fasta, find if adjacent records can form pairs\n           pool | Pool a bunch of fastafiles together and add prefix\n           qual | Generate dummy .qual file based on FASTA file\n         random | Randomly take some records\n         sequin | Generate a gapped fasta file for sequin submission\n       simulate | Simulate random fasta file for testing\n           some | Include or exclude a list of records (also performs on .qual file if available)\n           sort | Sort the records by IDs, sizes, etc.\n        summary | Report the real no of bases and N's in fasta files\n           tidy | Normalize gap sizes and remove small components in fasta\n      translate | Translate CDS to proteins\n           trim | Given a cross_match screened fasta, trim the sequence\n      trimsplit | Split sequences at lower-cased letters\n           uniq | Remove records that are the same\n```\n\nThen you need to use one action, you can just do:\n\n```console\npython -m jcvi.formats.fasta extract\n```\n\nThis will tell you the options and arguments it expects.\n\n**Feel free to check out other scripts in the package, it is not just\nfor FASTA.**\n\n## Star History\n\n[![Star History\nChart](https://api.star-history.com/svg?repos=tanghaibao/jcvi\u0026type=Date)](https://star-history.com/#tanghaibao/jcvi\u0026Date)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftanghaibao%2Fjcvi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftanghaibao%2Fjcvi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftanghaibao%2Fjcvi/lists"}