https://github.com/pnnl/mercat
MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data
https://github.com/pnnl/mercat
dask database-independent-analysis diversity diversity-estimation divideandconquer fastq k-mer-counting k-mer-frequency kmer-frequency-count metagenomic-analysis metatranscriptomic-analysis nucleotides plotly protein python
Last synced: 17 days ago
JSON representation
MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data
- Host: GitHub
- URL: https://github.com/pnnl/mercat
- Owner: pnnl
- License: other
- Created: 2017-02-10T14:45:02.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2022-11-30T15:50:51.000Z (over 2 years ago)
- Last Synced: 2025-03-26T00:41:31.190Z (about 1 month ago)
- Topics: dask, database-independent-analysis, diversity, diversity-estimation, divideandconquer, fastq, k-mer-counting, k-mer-frequency, kmer-frequency-count, metagenomic-analysis, metatranscriptomic-analysis, nucleotides, plotly, protein, python
- Language: Python
- Homepage: https://github.com/pnnl/mercat
- Size: 2.7 MB
- Stars: 18
- Watchers: 5
- Forks: 13
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# This version of MerCat is depreciated please use the new updated version
# https://github.com/raw-lab/mercat2===========================================================================================
###### MerCat: python code for versatile k-mer counter and diversity estimator for database independent property analysis (DIPA) obtained from metagenomic and/or metatranscriptomic sequencing data

Installing MerCat:
- Available via Anaconda: Enable BioConda repo and run `conda install mercat`
- We do not have a pip installer available as of now. If you would like to use pip, please install the
modules listed in `dependencies.txt` via pip and run `python setup.py install` for setting up mercat.
Usage:
-----
* -i I path-to-input-file
* -f F path-to-folder-containing-input-files
* -k K kmer length
* -n N no of cores [default = all]
* -c C minimum kmer count [default = 10]
* -pro run mercat on protein input file specified as .faa
* -q tell mercat that input file provided are raw nucleotide reads as [.fq, .fastq]
* -p run prodigal on nucleotide assembled contigs. Must be one of ['.fa', '.fna', '.ffn', '.fasta']
* -t [T] Trimmomatic options
* -s Data split size for large files (default is 100 Mb file size)
* -h, --help show this help messageBy default mercat assumes that inputs provided is one of ['.fa', '.fna', '.ffn', '.fasta']
> Example: To compute all 3-mers, run `mercat -i test.fa -k 3 -n 8 -c 10 -p`
The above command:
* Runs prodigal on `test.fa`, then runs mercat on the resulting protein file.
* Results are generally stored in input-file-name_{protein|nucleotide}.csv and input-file-name_{protein|nucleotide}_summary.csv
* `test_protein.csv` and `test_protein_summary.csv` in this example
* `test_protein_summary.csv` contains kmer frequency count, pI, Molecular Weight, and Hydrophobicity metrics for all unique kmers across all sequences in `test.fa`
* `test_protein_diversity_metrics.txt` containing the alpha diversity metrics.* `test_protein.csv` contains kmer frequency count, pI, Molecular Weight, and Hydrophobicity metrics for individual sequences.
> NOTE: We disabled the code that generates this file since computing k-mer counts for individual
sequences was getting very expensive in terms of time & memory usage for large input files.Other usage examples:
---------------------* `mercat -i test.fq -k 3 -n 8 -c 10 -q`
Runs mercat on raw nucleotide read (.fq or .fastq)
* `mercat -i test.fq -k 3 -n 8 -c 10 -q -t`
Runs trimmomatic on raw nucleotide reads (.fq or .fastq), then runs mercat on the trimmed nucleotides
* `mercat -i test.fq -k 3 -n 8 -c 10 -q -t 20`
Same as above but can provide the quality option to trimmomatic
* `mercat -i test.fq -k 3 -n 8 -c 10 -q -t 20 -p`
Run trimmomatic on raw nucleotide reads, then run prodigal on the trimmed read to produce a protein file which is then processed by mercat
* `mercat -i test.fna -k 3 -n 8 -c 10`
Run mercat on nucleotide input - one of ['.fa', '.fna', '.ffn', '.fasta']
* `mercat -i test.fna -k 3 -n 8 -c 10 -p`
Run prodigal on nucleotide input, generate a .faa protein file and run mercat on it
* `mercat -i test.faa -k 3 -n 8 -c 10 -pro`
Run mercat on a protein input (.faa)* All the above examples can also be used with `-f input-folder` instead of `-i input-file` option
- Example: `mercat -f /path/to/input-folder -k 3 -n 8 -c 10` --- Runs mercat on all inputs in the folder
* To save working memory (RAM) on low RAM computers or >2 GB files use '-s option' to split/chunk the file
- Example: `mercat -i test.fna -k 3 -n 8 -c 10 -s 50` --Runs mercat in nucleotide mode splitting file into 50 MB pieces
Citing Mercat
-------------
If you are publishing results obtained using MerCat, please cite:White III RA, Panyala A, Glass K, Colby S, Glaesemann KR, Jansson C, Jansson JK. (2017) MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data. PeerJ Preprints 5:e2825v1 https://doi.org/10.7287/peerj.preprints.2825v1
CONTACT
-------Please send all queries to Richard Allen White III <[[email protected]]([email protected])> or <[[email protected]]([email protected])>