{"id":19234332,"url":"https://github.com/moshi4/cogclassifier","last_synced_at":"2025-05-12T13:13:27.979Z","repository":{"id":62563385,"uuid":"471279149","full_name":"moshi4/COGclassifier","owner":"moshi4","description":"A tool for classifying prokaryote protein sequences into COG(Cluster of Orthologous Genes) functional category","archived":false,"fork":false,"pushed_at":"2025-05-03T06:14:36.000Z","size":35267,"stargazers_count":65,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-12T13:13:14.921Z","etag":null,"topics":["bioinformatics","cog","comparative-genomics","functional-analysis","functional-annotation","genome-analysis","genomics","microbial-genomics","protein","python","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/moshi4.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-18T07:48:59.000Z","updated_at":"2025-05-07T20:50:28.000Z","dependencies_parsed_at":"2024-12-16T05:06:10.224Z","dependency_job_id":"6c6187b7-91a5-4fa1-91dc-cf68921a0b35","html_url":"https://github.com/moshi4/COGclassifier","commit_stats":{"total_commits":82,"total_committers":2,"mean_commits":41.0,"dds":"0.012195121951219523","last_synced_commit":"df52c113bcc81e3aea7c0f70944bf513437e8ac8"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moshi4%2FCOGclassifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moshi4%2FCOGclassifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moshi4%2FCOGclassifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moshi4%2FCOGclassifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/moshi4","download_url":"https://codeload.github.com/moshi4/COGclassifier/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253745195,"owners_count":21957319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","cog","comparative-genomics","functional-analysis","functional-annotation","genome-analysis","genomics","microbial-genomics","protein","python","visualization"],"created_at":"2024-11-09T16:13:28.923Z","updated_at":"2025-05-12T13:13:27.955Z","avatar_url":"https://github.com/moshi4.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# COGclassifier\n\n![Python3](https://img.shields.io/badge/Language-Python3-steelblue)\n![OS](https://img.shields.io/badge/OS-Windows_|_Mac_|_Linux-steelblue)\n![License](https://img.shields.io/badge/License-MIT-steelblue)\n[![Latest PyPI version](https://img.shields.io/pypi/v/cogclassifier.svg)](https://pypi.python.org/pypi/cogclassifier)\n[![Bioconda](https://img.shields.io/conda/vn/bioconda/cogclassifier.svg?color=green)](https://anaconda.org/bioconda/cogclassifier)\n![CI workflow](https://github.com/moshi4/COGclassifier/actions/workflows/ci.yml/badge.svg)\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Installation](#installation)\n- [Workflow](#workflow)\n- [Usage](#usage)\n- [Output Contents](#output-contents)\n- [Customize Charts](#customize-charts)\n\n## Overview\n\nCOG(Cluster of Orthologous Genes) is a database that plays an important role in the annotation, classification, and analysis of microbial gene function.\nFunctional annotation, classification, and analysis of each gene in newly sequenced bacterial genomes using the COG database is a common task.\nHowever, there was no COG functional classification command line software that is easy-to-use and capable of producing publication-ready figures.\nTherefore, I developed COGclassifier to fill this need.\nCOGclassifier can automatically perform the processes from searching query sequences into the COG database, to annotation and classification of gene functions, to generation of publication-ready figures (See figure below).\n\n![ecoli_barchart_fig](https://raw.githubusercontent.com/moshi4/COGclassifier/main/example/output/ecoli/cog_count_barchart.png)  \nFig.1: Barchart of COG funcitional category classification result for E.coli\n\n![ecoli_piechart_fig](https://raw.githubusercontent.com/moshi4/COGclassifier/main/example/output/ecoli/cog_count_piechart.png)  \nFig.2: Piechart of COG funcitional category classification result for E.coli\n\n## Installation\n\n`Python 3.9 or later` is required for installation. Installation of RPS-BLAST(ncbi-blast+) is also necessary.\n\n**Install bioconda package:**\n\n    conda install -c conda-forge -c bioconda cogclassifier\n\n**Install PyPI stable package:**\n\n    pip install cogclassifier\n\n## Workflow\n\nDescription of COGclassifier's automated workflow.\nThis workflow was created based in part on [cdd2cog](https://github.com/aleimba/bac-genomics-scripts/tree/master/cdd2cog).\n\n### 1. Setup COG \u0026 CDD resources\n\nDownload \u0026 load 4 required COG \u0026 CDD files from FTP site.\n\n- `cog-24.fun.tab` (\u003chttps://ftp.ncbi.nih.gov/pub/COG/COG2024/data/cog-24.fun.tab\u003e)  \n    Descriptions of COG functional categories.  \n    This resource file is included in the package as `cog_func_category.tsv`.  \n\n    \u003cdetails\u003e\n    \u003csummary\u003eShow more information\u003c/summary\u003e\n\n    \u003e Tab-delimited plain text file with descriptions of COG functional categories  \n    \u003e The categories form four functional groups:  \n    \u003e 1\\. INFORMATION STORAGE AND PROCESSING  \n    \u003e 2\\. CELLULAR PROCESSES AND SIGNALING  \n    \u003e 3\\. METABOLISM  \n    \u003e 4\\. POORLY CHARACTERIZED  \n    \u003e Columns:  \n    \u003e 1\\. Functional category ID (one letter)  \n    \u003e 2\\. Functional group (1-4, as above)  \n    \u003e 3\\. Hexadecimal RGB color associated with the functional category  \n    \u003e 4\\. Functional category description  \n    \u003e Each line corresponds to one functional category. The order of the categories is meaningful (reflects a hierarchy of functions; determines the order of display)  \n    \u003e\n    \u003e (From \u003chttps://ftp.ncbi.nih.gov/pub/COG/COG2024/data/Readme.COG2024.txt\u003e)  \n\n    \u003c/details\u003e\n\n- `cog-24.def.tab` (\u003chttps://ftp.ncbi.nih.gov/pub/COG/COG2024/data/cog-24.def.tab\u003e)  \n    COG descriptions such as 'COG ID', 'COG functional category', 'COG name', etc...  \n    This resource file is included in the package as `cog_definition.tsv`.  \n\n    \u003cdetails\u003e\n    \u003csummary\u003eShow more information\u003c/summary\u003e\n\n    \u003e Tab-delimited plain text file with COG descriptions  \n    \u003e Columns:  \n    \u003e 1\\. COG ID  \n    \u003e 2\\. COG functional category (could include multiple letters in the order of importance)  \n    \u003e 3\\. COG name  \n    \u003e 4\\. Gene name associated with the COG (optional)  \n    \u003e 5\\. Functional pathway associated with the COG (optional)  \n    \u003e 6\\. PubMed ID, associated with the COG (multiple entries are semicolon-separated; optional)  \n    \u003e 7\\. PDB ID of the structure associated with the COG (multiple entries are semicolon-separated; optional)  \n    \u003e Each line corresponds to one COG. The order of the COGs is arbitrary (displayed in the lexicographic order)  \n    \u003e\n    \u003e (From \u003chttps://ftp.ncbi.nih.gov/pub/COG/COG2024/data/Readme.COG2024.txt\u003e)\n\n    \u003c/details\u003e\n\n- `cddid.tbl.gz` (\u003chttps://ftp.ncbi.nih.gov/pub/mmdb/cdd/\u003e)  \n    Summary information about the CD(Conserved Domain) model.  \n\n    \u003cdetails\u003e\n    \u003csummary\u003eShow more information\u003c/summary\u003e\n\n    \u003e\"cddid.tbl.gz\" contains summary information about the CD models in this\n    \u003edistribution, which are part of the default \"cdd\" search database and are\n    \u003eindexed in NCBI's Entrez database. This is a tab-delimited text file, with a\n    \u003esingle row per CD model and the following columns:  \n    \u003e\n    \u003ePSSM-Id (unique numerical identifier)  \n    \u003eCD accession (starting with 'cd', 'pfam', 'smart', 'COG', 'PRK' or \"CHL')  \n    \u003eCD \"short name\"  \n    \u003eCD description  \n    \u003ePSSM-Length (number of columns, the size of the search model)  \n    \u003e\n    \u003e (From \u003chttps://ftp.ncbi.nih.gov/pub/mmdb/cdd/README\u003e)\n\n    \u003c/details\u003e\n\n- `Cog_LE.tar.gz` (\u003chttps://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/\u003e)  \n    COG database, a part of CDD(Conserved Domain Database), for RPS-BLAST search.  \n\n### 2. RPS-BLAST search against COG database\n\nRun query sequences RPS-BLAST against COG database [Default: E-value = 1e-2].\nBest-hit (=lowest e-value) blast results are extracted and used in next functional classification step.\n\n### 3. Classify query sequences into COG functional category\n\nFrom best-hit results, extract relationship between query sequences and COG functional category as described below.\n\n1. Best-hit results -\u003e CDD ID\n2. CDD ID -\u003e COG ID (From `cddid.tbl.gz`)\n3. COG ID -\u003e COG Functional Category Letter (From `cog-24.def.tab`)\n4. COG Functional Category Letter -\u003e COG Functional Category Definition (From `cog-24.fun.tab`)\n\n\u003e :warning:\n\u003e If functional category with multiple letters exists, first letter is treated as functional category\n\u003e (e.g. COG4862 has multiple letters `KTN`. A letter `K` is treated as functional category).\n\nUsing the above information, the number of query sequences classified into each COG functional category is calculated and\nfunctional annotation and classification results are output.\n\n## Usage\n\n### Basic Command\n\n    COGclassifier -i [protein fasta file] -o [output directory]\n\n### Options\n\n    $ COGclassifier --help\n                                                                                                                          \n    Usage: COGclassifier [OPTIONS]                                                                                       \n                                                                                                                          \n    A tool for classifying prokaryote protein sequences into COG functional category                                     \n                                                                                                                          \n    ╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n    │ *  --infile        -i        Input query protein fasta file [required]                                             │\n    │ *  --outdir        -o        Output directory [required]                                                           │\n    │    --download_dir  -d        Download COG \u0026 CDD resources directory [default: /home/user/.cache/cogclassifier_v2]  │\n    │    --thread_num    -t        RPS-BLAST num_thread parameter [default: MaxThread - 1]                               │\n    │    --evalue        -e        RPS-BLAST e-value parameter [default: 0.01]                                           │\n    │    --quiet         -q        No print log on screen                                                                │\n    │    --version       -v        Print version information                                                             │\n    │    --help          -h        Show this message and exit.                                                           │\n    ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n\n### Example Command\n\nClick [here](https://github.com/moshi4/COGclassifier/raw/main/example/example.zip) to download example protein fasta files.\n\n    COGclassifier -i ./example/ecoli.faa -o ./ecoli_cogclassifier\n\n## Output Contents\n\n- **`rpsblast.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/mycoplasma/rpsblast.tsv))  \n  RPS-BLAST against COG database result (format = `outfmt 6`).  \n\n- **`cog_classify.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/mycoplasma/cog_classify.tsv))  \n  Query sequences classified into COG functional category result.  \n  This file contains all classified query sequences and associated COG information.  \n\n    \u003cdetails\u003e\n    \u003csummary\u003eTable of detailed tsv format information (9 columns)\u003c/summary\u003e\n\n    | Columns          | Contents                               | Example Value                       |\n    | ---------------- | -------------------------------------- | ----------------------------------- |\n    | QUERY_ID         | Query ID                               | NP_414544.1                         |\n    | COG_ID           | COG ID of RPS-BLAST top hit result     | COG0083                             |\n    | CDD_ID           | CDD ID of RPS-BLAST top hit result     | 223161                              |\n    | EVALUE           | RPS-BLAST top hit evalue               | 2.5e-150                            |\n    | IDENTITY         | RPS-BLAST top hit identity             | 45.806                              |\n    | GENE_NAME        | Abbreviated gene name                  | ThrB                                |\n    | COG_NAME         | COG gene name                          | Homoserine kinase                   |\n    | COG_LETTER       | Letter of COG functional category      | E                                   |\n    | COG_DESCRIPTION  | Description of COG functional category | Amino acid transport and metabolism |\n\n    \u003c/details\u003e\n\n- **`cog_count.tsv`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/ecoli/cog_count.tsv))  \n  Count classified sequences per COG functional category result.  \n\n    \u003cdetails\u003e\n    \u003csummary\u003eTable of detailed tsv format information (5 columns)\u003c/summary\u003e\n\n    | Columns     | Contents                                | Example Value                                   |\n    | ------------| --------------------------------------- | ----------------------------------------------- |\n    | LETTER      | Letter of COG functional category       | J                                               |\n    | COUNT       | Count of COG classified sequence        | 259                                             |\n    | GROUP       | COG functional group                    | INFORMATION STORAGE AND PROCESSING              |\n    | COLOR       | Symbol color of COG functional category | #FCCCFC                                       |\n    | DESCRIPTION | Description of COG functional category  | Translation, ribosomal structure and biogenesis |\n\n    \u003c/details\u003e\n\n- **`cogclassifier.log`** ([example](https://github.com/moshi4/COGclassifier/blob/main/example/output/ecoli/cogclassifier.log))  \n  COGclassifier log file.\n\n- **`cog_count_barchart.[png|html]`**  \n  Barchart of COG funcitional category classification result.  \n  COGclassifier uses [`Altair`](https://altair-viz.github.io/) visualization library for plotting charts.  \n\n  ![cog_count_barchart](https://raw.githubusercontent.com/moshi4/COGclassifier/main/example/output/ecoli/cog_count_barchart.png)\n\n- **`cog_count_piechart.[png|html]`**  \n  Piechart of COG funcitional category classification result.  \n  Functional category with percentages less than 1% don't display letter on piechart.  \n\n  ![cog_count_piechart](https://raw.githubusercontent.com/moshi4/COGclassifier/main/example/output/ecoli/cog_count_piechart.png)\n\n## Customize Charts\n\nCOGclassifier also provides barchart \u0026 piechart plotting API/CLI to customize charts appearence.\nSee [notebooks](https://github.com/moshi4/COGclassifier/blob/main/example/plot/plot_example.ipynb) and command below for details.\n\n### plot_cog_count_barchart\n\n    $ plot_cog_count_barchart --help\n                                                                                                  \n    Usage: plot_cog_count_barchart [OPTIONS]                                                      \n                                                                                                  \n    Plot COGclassifier count barchart figure                                                      \n                                                                                                  \n    ╭─ Options ───────────────────────────────────────────────────────────────────────────────────╮\n    │ *  --infile         -i        Input COG count result file ('cog_count.tsv') [required]      │\n    │ *  --outfile        -o        Output barchart figure file (*.png|*.svg|*.html) [required]   │\n    │    --width                    Figure pixel width [default: 440]                             │\n    │    --height                   Figure pixel height [default: 340]                            │\n    │    --bar_width                Figure bar width [default: 15]                                │\n    │    --y_limit                  Y-axis max limit value                                        │\n    │    --percent_style            Plot percent style instead of number count                    │\n    │    --sort                     Enable descending sort by number count                        │\n    │    --dpi                      Figure DPI [default: 100]                                     │\n    │    --help           -h        Show this message and exit.                                   │\n    ╰─────────────────────────────────────────────────────────────────────────────────────────────╯\n\n### plot_cog_count_piechart\n\n    $ plot_cog_count_piechart --help\n                                                                                                  \n    Usage: plot_cog_count_piechart [OPTIONS]                                                      \n                                                                                                  \n    Plot COGclassifier count piechart figure                                                      \n                                                                                                  \n    ╭─ Options ───────────────────────────────────────────────────────────────────────────────────╮\n    │ *  --infile       -i        Input COG count result file ('cog_count.tsv') [required]        │\n    │ *  --outfile      -o        Output piechart figure file (*.png|*.svg|*.html) [required]     │\n    │    --width                  Figure pixel width [default: 380]                               │\n    │    --height                 Figure pixel height [default: 380]                              │\n    │    --show_letter            Show functional category lettter on piechart                    │\n    │    --sort                   Enable descending sort by number count                          │\n    │    --dpi                    Figure DPI [default: 100]                                       │\n    │    --help         -h        Show this message and exit.                                     │\n    ╰─────────────────────────────────────────────────────────────────────────────────────────────╯\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoshi4%2Fcogclassifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoshi4%2Fcogclassifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoshi4%2Fcogclassifier/lists"}