{"id":23505665,"url":"https://github.com/gamcil/clinker","last_synced_at":"2025-04-10T06:12:46.723Z","repository":{"id":37717419,"uuid":"193022148","full_name":"gamcil/clinker","owner":"gamcil","description":"Gene cluster comparison figure generator","archived":false,"fork":false,"pushed_at":"2024-11-25T03:28:52.000Z","size":3134,"stargazers_count":560,"open_issues_count":43,"forks_count":70,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-03T03:09:53.441Z","etag":null,"topics":["bioinformatics","d3js","python","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gamcil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-21T03:11:36.000Z","updated_at":"2025-04-02T02:36:50.000Z","dependencies_parsed_at":"2023-01-29T20:45:45.748Z","dependency_job_id":"313265e3-bcd3-4346-86eb-6301870e5427","html_url":"https://github.com/gamcil/clinker","commit_stats":{"total_commits":168,"total_committers":6,"mean_commits":28.0,"dds":0.0535714285714286,"last_synced_commit":"a9f426be1316c7f8879f56604df42fcc8aedb5cd"},"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fclinker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fclinker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fclinker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fclinker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gamcil","download_url":"https://codeload.github.com/gamcil/clinker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248166925,"owners_count":21058481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","d3js","python","visualization"],"created_at":"2024-12-25T09:38:35.332Z","updated_at":"2025-04-10T06:12:46.697Z","avatar_url":"https://github.com/gamcil.png","language":"Python","readme":"# clinker\n\n\u003eBoth cblaster and clinker can now be used without installation on the [CAGECAT webserver](http://cagecat.bioinformatics.nl/).\n\nGene cluster comparison figure generator\n\n## What is it?\nclinker is a pipeline for easily generating publication-quality gene cluster\ncomparison figures.\n\n\u003cimg src=\"images/figure.png\" alt=\"bua cluster and homologues\" width=700\u003e\n\nGiven a set of GenBank files, clinker will automatically extract protein translations,\nperform global alignments between sequences in each cluster, determine the\noptimal display order based on cluster similarity, and generate an interactive\nvisualisation (using [clustermap.js](https://github.com/gamcil/clustermap.js))\nthat can be extensively tweaked before being exported as an SVG file.\n\n### A note on scope:\nclinker was designed primarily as a simple way to visualise groups of homologous\nbiosynthetic gene clusters, which are typically small genomic regions with not many genes\n(as in the example GIF). It performs pairwise alignments of all genes in all input files using\nthe [aligner built into BioPython](https://biopython.org/docs/1.76/api/Bio.Align.html#Bio.Align.PairwiseAligner),\nthen generates an interactive SVG document in the browser.\nThe alignment stage will scale very poorly to multiple genomes with many genes, and the resulting\nvisualisation will also be very slow given how many SVG elements it will contain.\nIf you are looking to align entire genomes, you will likely be better served using \ntools built for that purpose (e.g. [Cactus](https://github.com/ComparativeGenomicsToolkit/cactus)).\n\n![clinker visualisation demo](images/demo.gif)\n\n## Installation\nclinker can be installed directly through pip:\n\n`pip install clinker`\n\nBy cloning the source code from GitHub:\n\n```\ngit clone https://github.com/gamcil/clinker.git\ncd clinker\npip install .\n```\n\nOr, through conda:\n\n```\nconda create -n clinker -c conda-forge -c bioconda clinker-py\nconda activate clinker\n```\n\n## Citation\nIf you found clinker useful, please cite:\n```\nclinker \u0026 clustermap.js: Automatic generation of gene cluster comparison figures.\nGilchrist, C.L.M., Chooi, Y.-H., 2020.\nBioinformatics. doi: https://doi.org/10.1093/bioinformatics/btab007\n```\n\n## Usage\nRunning clinker can be as simple as:\n\n`clinker clusters/*.gbk`\n\nThis will read in all GenBank files inside the folder, align them, and print\nthe alignments to the terminal. To generate the visualisation, use the `-p/--plot`\nargument: \n\n`clinker clusters/*.gbk -p \u003coptional: file name to save static HTML\u003e`\n\nclinker can also parse GFF3 files:\n\n`clinker cluster1.gff3 cluster2.gff3 -p`\n\nNote: a corresponding FASTA file of the same name (extensions \".fa\", \".fsa\", \".fna\", \".fasta\" or \".faa\") must\nbe found in the same directory as the GFF3, i.e. `cluster1.fa` and `cluster2.fa`.\n\nSee `-h/--help` for more information:\n\n```\nusage: clinker [-h] [--version] [-r RANGES [RANGES ...]] [-gf GENE_FUNCTIONS] [-na] [-i IDENTITY] [-j JOBS] [-s SESSION] [-ji JSON_INDENT] [-f] [-o OUTPUT] [-p [PLOT]] [-dl DELIMITER] [-dc DECIMALS] [-hl] [-ha] [-mo MATRIX_OUT] [-ufo] [files ...]\n\nclinker: Automatic creation of publication-ready gene cluster comparison figures.\n\nclinker generates gene cluster comparison figures from GenBank files. It performs pairwise local or global alignments between every sequence in every unique pair of clusters and generates interactive, to-scale comparison figures using the clustermap.js library.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --version             show program's version number and exit\n\nInput options:\n  files                 Gene cluster GenBank files\n  -r RANGES [RANGES ...], --ranges RANGES [RANGES ...]\n                        Scaffold extraction ranges. If a range is specified, only features within the range will be extracted from the scaffold. Ranges should be formatted like: scaffold:start-end (e.g. scaffold_1:15000-40000)\n  -gf GENE_FUNCTIONS, --gene_functions GENE_FUNCTIONS\n                        2-column CSV file containing gene functions, used to build gene groups from same function instead of sequence similarity (e.g. GENE_001,PKS-NRPS).\n\nAlignment options:\n  -na, --no_align       Do not align clusters\n  -i IDENTITY, --identity IDENTITY\n                        Minimum alignment sequence identity [default: 0.3]\n  -j JOBS, --jobs JOBS  Number of alignments to run in parallel (0 to use the number of CPUs) [default: 0]\n\nOutput options:\n  -s SESSION, --session SESSION\n                        Path to clinker session\n  -ji JSON_INDENT, --json_indent JSON_INDENT\n                        Number of spaces to indent JSON [default: none]\n  -f, --force           Overwrite previous output file\n  -o OUTPUT, --output OUTPUT\n                        Save alignments to file\n  -p [PLOT], --plot [PLOT]\n                        Plot cluster alignments using clustermap.js. If a path is given, clinker will generate a portable HTML file at that path. Otherwise, the plot will be served dynamically using Python's HTTP server.\n  -dl DELIMITER, --delimiter DELIMITER\n                        Character to delimit output by [default: human readable]\n  -dc DECIMALS, --decimals DECIMALS\n                        Number of decimal places in output [default: 2]\n  -hl, --hide_link_headers\n                        Hide alignment column headers\n  -ha, --hide_aln_headers\n                        Hide alignment cluster name headers\n  -mo MATRIX_OUT, --matrix_out MATRIX_OUT\n                        Save cluster similarity matrix to file\n\nVisualisation options:\n  -ufo, --use_file_order\n                        Display clusters in order of input files\n\nExample usage\n-------------\nAlign clusters, plot results and print scores to screen:\n  $ clinker files/*.gbk\n\nOnly save gene-gene links when identity is over 50%:\n  $ clinker files/*.gbk -i 0.5\n\nSave an alignment session for later:\n  $ clinker files/*.gbk -s session.json\n\nSave alignments to file, in comma-delimited format, with 4 decimal places:\n  $ clinker files/*.gbk -o alignments.csv -dl \",\" -dc 4\n\nGenerate visualisation:\n  $ clinker files/*.gbk -p\n\nSave visualisation as a static HTML document:\n  $ clinker files/*.gbk -p plot.html\n\nCameron Gilchrist, 2020\n```\n\n## Defining gene groups by function\n\nBy default, clinker automatically assigns a name and colour for each group of homologous genes.\nYou can instead pre-assign names (i.e. functions) using the `-gf/--gene_functions` argument, which\ntakes a 2-column comma-separated file like:\n\n```\nGENE_001,Cytochrome P450 \nGENE_002,Cytochrome P450 \nGENE_003,Methyltransferase\nGENE_004,Methyltransferase\n```\n\nThis will generate two groups, Cytochrome P450 (GENE_001 and 002), and Methyltransferase (GENE_003, GENE_004).\nIf there any other homologous genes are identified, they will automatically be added to these groups.\n\nAs of clinker v0.0.28, you can now specify colours for genes defined by the\n`-gf/--gene_functions` argument. To do this, use the `-cm/--colour_map` argument which\nalso takes a 2-column CSV file containing the group name and hexadecimal colour code like:\n\n```\nCytochrome P450,#FF0000\nMethyltransferase,#0000FF\n```\n","funding_links":[],"categories":["Comparative"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgamcil%2Fclinker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgamcil%2Fclinker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgamcil%2Fclinker/lists"}