Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gaurav/taxondna

Taxonomy-aware DNA sequence processing toolkit
https://github.com/gaurav/taxondna

java maven sequencematrix speciesidentifier

Last synced: about 5 hours ago
JSON representation

Taxonomy-aware DNA sequence processing toolkit

Awesome Lists containing this project

README

        

Readme for TaxonDNA 1.8
=======================

CONTENTS
1. System requirements
2. Individual tools
3. Building software
4. Memory requirements
5. File formats
6. Further information
7. References

1. SYSTEM REQUIREMENTS

TaxonDNA is written entirely in Java, with its interface written using
either Java AWT or Swing. Its only requirement is a standard Java VM -
except that since TaxonDNA 1.6 we've given up supporting Java 1.4, so
you'll need to have atleast Java 1.5 since then.

There are a long list of bugs [1], some of which prevent TaxonDNA from
working well on multiple platforms. Note particularly UI-related issues:
some UI elements only work on particular platforms, although we try to
work around them as well as we can.

2. INDIVIDUAL TOOLS

TaxonDNA consists of three tools - Species Identifier [2], SequenceMatrix,
and GenBank Explorer (still in beta). JNLP files for the first two
are available in this repository (but you'll have to get someone to
the latest JAR files to Sourceforge), as well as MS-DOS Batch files,
if you prefer those. You should be able to get GBX working by running:
cd build/classes
java -Xmx1024M com.ggvaidya.TaxonDNA.GenBankExplorer.GenBankExplorer

3. BUILDING SOFTWARE

You will need the javac compiler, a copy of the Java runtime and the
Ant build tool in order to build TaxonDNA. The Ant build.xml should
contain all the necessary instructions to build any of the components
you need. A plain "ant" will compile all the source; additional
targets are available to package TaxonDNA.

4. MEMORY REQUIREMENTS

TaxonDNA uses an unfortunately a large amount of memory per sequence.
It can handle larger files fairly well, but by default, Java
applications are limited to 64mb. You will need to use the '-Xmx'
option to increase your memory usage. The easiest way to do this is,
at the command line, to enter:
java -Xmx1024M -jar TaxonDNA.jar

You might also have to adjust the value of 1024 megabytes depending
on the size of your dataset and on the memory available to your
computer. Java will refuse to run if the Xmx value specified is too
large. Also, if TaxonDNA.jar is not in the directory specified, you
might need to enter the complete path to TaxonDNA, as so:
java -Xmx1024M -jar C:\TaxonDNA\TaxonDNA.jar

If the program runs out of memory while running, it will appear to
"hang" and become completely unresponsive. There is at present no
way to recover from this state. You'll want to increase your memory
setting using "-Xmx" as shown above.

Please note that trying to load a sequences file larger than the
memory size specified (i.e. 1024 mb in the above example) will also
result in the program hanging. We have tested this program with a
file of 2,185 sequences with 2,664 base pairs, and it has been known
to work on datasets upto 8,000 sequences long. Please let us know if
memory size is an issue for you, and we will try to incorporate
workarounds in future versions.

5. FILE FORMATS

This program can handle input in FASTA, MEGA, Nexus and TNT formats.
GenBank support is provided through the GenBankExplorer. FASTA
is our oldest supported format, and most of the tools are designed
to accept sequences from NCBI GanBank (exported as FASTA) as input.
It attempts to guess the species name from the FASTA title string,
and considers hyphens ('-') as gaps, and question marks ('?') as
missing data (incidently, if you use a very different gap/missing
syntax, you could use Nexus as an intermediate format, since it
supports defining your own gap/missing data specifier.

6. FURTHER INFORMATION

If you need more information about this program, or have any other
questions, queries or bug reports to make, please contact us, either
via *any* of our websites:
http://taxondna.sf.net/
http://code.google.com/p/taxondna/
http://github.com/gaurav/taxondna/
http://groups.google.com/group/taxondna

or by e-mail at
gaurav AT ggvaidya DOT com.

7. REFERENCES
[1] TaxonDNA bug list at http://code.google.com/p/taxondna/issues/list
[2] Meier, R., Kwong, S., Vaidya, G., Ng, Peter K. L. (2006)
DNA Barcoding and Taxonomy in Diptera: a Tale of High Intraspecific
Variability and Low Identification Success.
Systematic Biology, 55: 715-728.