Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lhncbc/metamaplite

A near real-time named-entity recognizer
https://github.com/lhncbc/metamaplite

metamap named-entity-recognition umls

Last synced: about 1 month ago
JSON representation

A near real-time named-entity recognizer

Awesome Lists containing this project

README

        

## A BioC XML to A BioC XML implementation of MetaMapLite

The class gov.nih.nlm.nls.metamap.lite.BioCProcess allows MetaMapLite
to process BioC XML input and write the results in BioC XML.

### Using from Java

Below is an example using BioC Processing with MetaMapLite

// Initialize MetaMapLite
Properties defaultConfiguration = getDefaultConfiguration();
String configPropertiesFilename =
System.getProperty("metamaplite.propertyfile",
"config/metamaplite.properties");
Properties configProperties = new Properties();
// set any in-line properties here.
FileReader fr = new FileReader(configPropertiesFilename);
configProperties.load(fr);
fr.close();

configProperties.setProperty("metamaplite.semanticgroup",
"acab,anab,bact,cgab,dsyn,emod,inpo,mobd,neop,patf,sosy");
Properties properties =
Configuration.mergeConfiguration(configProperties,
defaultConfiguration);
BioCProcess process = new BioCProcess(properties);

// read BioC XML collection
Reader inputReader = new FileReader(inputFile);
BioCFactory bioCFactory = BioCFactory.newFactory("STANDARD");
BioCCollectionReader collectionReader =
bioCFactory.createBioCCollectionReader(inputReader);
BioCCollection collection = collectionReader.readCollection();

// Run named entity recognition on collection
BioCCollection newCollection = process.processCollection(collection);

// write out the annotated collection
File outputFile = new File(outputFilename);
Writer outputWriter = new PrintWriter(outputFile, "UTF-8");
BioCCollectionWriter collectionWriter = bioCFactory.createBioCCollectionWriter(outputWriter);
collectionWriter.writeCollection(newCollection);
outputWriter.close();

This process attempts to preserve any annotation present in the
original BioC XML input. Some annotations applied to BioC
structures before entity lookup, including `tokenization and
part-of-speech-tagging, are discarded before the writing out the final
annotated collection. This occurs in the entity lookup class
BioCEntityLookup (java package: gov.nih.nlm.nls.metamap.lite).

### Omissions in BioC version

The current version of the BioCProcess class does not call the
abbreviation detector or the negation detector. This should be
relatively simple to add (particularly the abbreviation detector) and
will probably be added in the next release.