https://github.com/nylander/consensus-sequence
https://github.com/nylander/consensus-sequence
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/nylander/consensus-sequence
- Owner: nylander
- License: mit
- Created: 2019-05-08T09:34:27.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-09-19T15:24:11.000Z (over 2 years ago)
- Last Synced: 2023-09-19T19:09:33.646Z (over 2 years ago)
- Language: Perl
- Size: 15.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Calculate consensus sequence from fasta or CSV
- Last modified: tis sep 19, 2023 05:23
- Sign: JN
## Description
Two scripts for calculating consensus or compromise (DNA) sequences from either fasta or csv input format.
### consensus-seq.pl
FILE: consensus-seq.pl
USAGE: ./consensus-seq.pl fasta-alignment-file
DESCRIPTION: Calculate consensus sequence from fasta alignment.
Can use a consensus level (1-100), or represent
a (strict) consensus using IUPAC symbols.
Prints to STDOUT or, if --outfile is used, to
an outfile.
Fasta sequence is wrapped to width set by -w,
(default 80), unless --nowrap is used.
Default fasta header will be based on infile
name, and will display some extra information.
For example:
>in.fas|consensus [conlevel=100 nseq=8 length=3844 identity=88.93]
The header can be overridden by using -l.
Regarding consensus level (taken from Bio::Align::AlignI):
"The consensus residue has to appear at least threshold %
of the sequences at a given location, otherwise a '?'
character will be placed at that location."
OPTIONS: -i,--infile= Provide fasta formatted
sequence alignment
-o,--oufile= Provide output file name
(will be fasta format)
-c,--conlevel= Provide consensus level (1-100).
Default '50'.
-s,--strict Synonym for -c=100. Overrides
-c.
-l,--label= Provide custom fasta header.
-w,--wrap= Set max line length in sequence
to . Default is 80.
-n,--nowrap Do not wrap (interleave)
sequence string.
-I,--IUPAC Represent all ambiguities as
IUPAC symbols in the (strict)
consensus.
REQUIREMENTS: BioPerl, perldoc
NOTES: ---
AUTHOR: Johan Nylander
COMPANY: NRM
VERSION: 1.0
CREATED: 2019-09-25 09:42:04
REVISION: ---
LICENSE: MIT
### consensus-seq-from-csv.pl
FILE: consensus-seq-from-csv.pl
USAGE: ./consensus-seq-from-csv.pl R84150_2009.duplicates.csv
./consensus-seq-from-csv.pl -d R84150_2009.duplicates.csv
./consensus-seq-from-csv.pl -s ';' R84150_2009.duplicates.txt
./consensus-seq-from-csv.pl -s ';' -d -nu dna.duplicates.txt
DESCRIPTION: Read csv file and print "conservative consensus".
That is, if there are polymorphism in a site, a question mark
is printed.
Reads a file, prints to stdout.
OPTIONS: -d, --debug Will show the input sequences and consensus aligned.
-s, --separator Define input (and output) separator (default ',').
-m, --missing Define consensus symbol. Default '?'.
-n, --nucleotide Input are nucleotides. Will accept IUPAC ambiquity
symbols.
-l, --label Sequence label. If empty, will try to use file name.
-f, --fasta Input is fasta.
-h, --help Will show brief help text
REQUIREMENTS: ---
NOTES: Uses the first part of the filename (.duplicates.txt) as
output sequence name, unless given as arg.
TODO: * Support fasta input
* Support AA input
AUTHOR: Johan Nylander
COMPANY: NRM
VERSION: 1.0
CREATED: 03/12/2019 11:02:49 AM
REVISION: ---
LICENSE: MIT