{"id":21678012,"url":"https://github.com/pnnl/mercat","last_synced_at":"2025-04-12T05:14:04.029Z","repository":{"id":64014357,"uuid":"81574364","full_name":"pnnl/mercat","owner":"pnnl","description":"MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data","archived":false,"fork":false,"pushed_at":"2022-11-30T15:50:51.000Z","size":2833,"stargazers_count":18,"open_issues_count":0,"forks_count":13,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-26T00:41:31.190Z","etag":null,"topics":["dask","database-independent-analysis","diversity","diversity-estimation","divideandconquer","fastq","k-mer-counting","k-mer-frequency","kmer-frequency-count","metagenomic-analysis","metatranscriptomic-analysis","nucleotides","plotly","protein","python"],"latest_commit_sha":null,"homepage":"https://github.com/pnnl/mercat","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pnnl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-02-10T14:45:02.000Z","updated_at":"2022-11-30T15:42:56.000Z","dependencies_parsed_at":"2022-11-30T17:02:08.267Z","dependency_job_id":null,"html_url":"https://github.com/pnnl/mercat","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnnl%2Fmercat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnnl%2Fmercat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnnl%2Fmercat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnnl%2Fmercat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pnnl","download_url":"https://codeload.github.com/pnnl/mercat/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248519556,"owners_count":21117761,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dask","database-independent-analysis","diversity","diversity-estimation","divideandconquer","fastq","k-mer-counting","k-mer-frequency","kmer-frequency-count","metagenomic-analysis","metatranscriptomic-analysis","nucleotides","plotly","protein","python"],"created_at":"2024-11-25T14:24:47.654Z","updated_at":"2025-04-12T05:14:03.974Z","avatar_url":"https://github.com/pnnl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# This version of MerCat is depreciated please use the new updated version\n# https://github.com/raw-lab/mercat2\n\n===========================================================================================\n\n###### MerCat: python code for versatile k-mer counter and diversity estimator for database independent property analysis (DIPA)  obtained from metagenomic and/or metatranscriptomic sequencing data\n\n![GitHub Logo](mercat_workflow.jpg)\n  \nInstalling MerCat: \n - Available via Anaconda: Enable BioConda repo and run `conda install mercat`  \n - We do not have a pip installer available as of now. If you would like to use pip, please install the \n   modules listed in `dependencies.txt` via pip and run `python setup.py install` for setting up mercat.\n \nUsage:\n-----\n * -i I        path-to-input-file\n * -f F        path-to-folder-containing-input-files\n * -k K        kmer length\n * -n N        no of cores [default = all]\n * -c C        minimum kmer count [default = 10]\n * -pro        run mercat on protein input file specified as .faa \n * -q          tell mercat that input file provided are raw nucleotide reads as [.fq, .fastq]\n * -p          run prodigal on nucleotide assembled contigs. Must be one of ['.fa', '.fna', '.ffn', '.fasta']\n * -t [T]      Trimmomatic options\n * -s          Data split size for large files (default is 100 Mb file size) \n * -h, --help  show this help message\n\n\nBy default mercat assumes that inputs provided is one of ['.fa', '.fna', '.ffn', '.fasta']\n\n\u003e Example: To compute all 3-mers, run `mercat -i test.fa -k 3 -n 8 -c 10 -p`          \n \n The above command:\n* Runs prodigal on `test.fa`, then runs mercat on the resulting protein file.            \n* Results are generally stored in input-file-name_{protein|nucleotide}.csv and input-file-name_{protein|nucleotide}_summary.csv  \n   * `test_protein.csv` and `test_protein_summary.csv` in this example  \n* `test_protein_summary.csv` contains kmer frequency count, pI, Molecular Weight, and Hydrophobicity metrics for all unique kmers across all sequences in `test.fa`\n* `test_protein_diversity_metrics.txt` containing the alpha diversity metrics.\n\n* `test_protein.csv` contains kmer frequency count, pI, Molecular Weight, and Hydrophobicity metrics for individual sequences. \n  \u003e NOTE: We disabled the code that generates this file since computing k-mer counts for individual \nsequences was getting very expensive in terms of time \u0026 memory usage for large input files.\n\n\nOther usage examples:\n---------------------\n\n* `mercat -i test.fq -k 3 -n 8 -c 10 -q`  \n   Runs mercat on raw nucleotide read (.fq or .fastq) \n   \n*  `mercat -i test.fq -k 3 -n 8 -c 10 -q -t`  \n   Runs trimmomatic on raw nucleotide reads (.fq or .fastq), then runs mercat on the trimmed nucleotides\n    \n*  `mercat -i test.fq -k 3 -n 8 -c 10 -q -t 20`  \n   Same as above but can provide the quality option to trimmomatic\n   \n*  `mercat -i test.fq -k 3 -n 8 -c 10 -q -t 20 -p`\n   Run trimmomatic on raw nucleotide reads, then run prodigal on the trimmed read to produce a protein file which is then processed by mercat\n      \n*  `mercat -i test.fna -k 3 -n 8 -c 10`  \n   Run mercat on nucleotide input - one of ['.fa', '.fna', '.ffn', '.fasta']\n    \n*   `mercat -i test.fna -k 3 -n 8 -c 10 -p`  \n    Run prodigal on nucleotide input, generate a .faa protein file and run mercat on it\n    \n*   `mercat -i test.faa -k 3 -n 8 -c 10 -pro`  \n    Run mercat on a protein input (.faa)\n\n* All the above examples can also be used with  `-f input-folder` instead of `-i input-file` option\n  -  Example:  `mercat  -f /path/to/input-folder -k 3 -n 8 -c 10` --- Runs mercat on all inputs in the folder\n  \n* To save working memory (RAM) on low RAM computers or \u003e2 GB files use '-s option' to split/chunk the file \n  - Example: `mercat -i test.fna -k 3 -n 8 -c 10 -s 50` --Runs mercat in nucleotide mode splitting file into 50 MB pieces \n  \n  \nCiting Mercat\n-------------\nIf you are publishing results obtained using MerCat, please cite:\n\nWhite III RA, Panyala A, Glass K, Colby S, Glaesemann KR, Jansson C, Jansson JK. (2017) MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data. PeerJ Preprints 5:e2825v1 https://doi.org/10.7287/peerj.preprints.2825v1\n\n\n\nCONTACT\n-------\n\nPlease send all queries to Richard Allen White III  \u003c[rwhit101@uncc.edu](rwhit101@uncc.edu)\u003e or \u003c[raw937@gmail.com](raw937@gmail.com)\u003e \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpnnl%2Fmercat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpnnl%2Fmercat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpnnl%2Fmercat/lists"}