Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lenards/mr-naims
A simple taxon name cleaner
https://github.com/lenards/mr-naims
Last synced: 8 days ago
JSON representation
A simple taxon name cleaner
- Host: GitHub
- URL: https://github.com/lenards/mr-naims
- Owner: lenards
- Created: 2013-01-29T23:18:55.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2013-02-01T20:28:36.000Z (almost 12 years ago)
- Last Synced: 2025-01-10T12:56:22.113Z (10 days ago)
- Language: Python
- Homepage:
- Size: 304 KB
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
mr-naims
========A simple name cleaner in python using the [Phylotastic TNRastic API](http://www.evoio.org/wiki/Phylotastic/TNRS) at [Taxosaurus](http://taxosaurus.org).
mr-naims is a [Phylotastic 2](http://evoio.org/wiki/Phylotastic) project
### Dependencies
* Python 2.7. You should run mr-naims in a [virtualenv](http://www.virtualenv.org/)
* [Requests: HTTP for humans](http://docs.python-requests.org/en/latest/). Install it in your virtualenv with `pip install requests`
* [DendroPy](http://packages.python.org/DendroPy/), for reading Newick and NeXML trees. `pip install dendropy`.### Usage
python simple.py [options] -f inputfile
inputfile may be a PDF, image, Office Document, Text file, Newick tree, or [NeXML file](http://www.nexml.org) (NeXML support is experimental). It will be sent to [Global Names Recognition and Discovery](http://gnrd.globalnames.org) to extract a list of scientific names, unless you specify -s/--skip-gnrd. Run `python simple.py -h` for help.
If providing a newick tree, specify the -n option.If you would like to limit the TNRS search to a specific provider, use the --source option, e.g. `--source MSW3`
The `test-set.txt` is included as an example list of names
mr-naims producecs a `inputfile.clean` file containing the cleaned list, and outputs a CSV report including the match score and provenance of each result.
### Known Issues
There are issues at the TNRS level with unicode names.