An open API service indexing awesome lists of open source software.

https://github.com/lhncbc/ctb

Custom Taxonomy Builder
https://github.com/lhncbc/ctb

clojure lvg taxonomy-construction umls

Last synced: 10 months ago
JSON representation

Custom Taxonomy Builder

Awesome Lists containing this project

README

          

# CTB - Custom Taxonomy Builder

## Description

Given a list of terms and a set of UMLS files, the CTB generates a
subset the of UMLS containing the supplied terms and their word-based
variants.

## Inputs

The following files should be placed in the data/input directory:

+ MRCONSO.RRF concepts file
+ MRSTY.RRF concept -> semantic types file

Supplied to Web Interface

+ list of supplied terms

## Outputs

+ Custom version of mrconso.rrf
+ Custom version of mrsty.rrf

## Usage

To use CTB you must first create indexes of your UMLS files and then
start the tool.

### Prepare Knowledge Sources

Copy MRCONSO.RRF, MRSTY.RRF to ctb/data/input/*your data set name*/.

In the ctb directory run:

bin/prepumls.sh 'your data set name'

For example:

bin/prepumls.sh 2016AA

Note: When using the GITHUB release, the name and path the standalone
jar will vary based on version in the project.clj file and the version
of Leiningen used, the CLASSPATH variable in the script
bin/prepumls.sh must be modified to match the current location of the
standalone jar (or uberjar).

### Update the system configuration file

There should be a file called ctb.properties in the `config`
directory. In ctb.properties change:

ctb.ivf.dataroot: ...

to:

ctb.ivf.dataroot: data/ivf/

### Adding LVG to configuration file for term expansion

If you want to use the Lexical Tools Lexical Variant Generator (LVG)
to supply term combinations not found in the UMLS then download LVG
from the Lexical Systems Group website
(https://lsg3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/web/index.html)
and install it according to its directions. After installing the
Lexical Tools then add the following to the ctb.properties file:

ctb.lvg.directory: {LVGDIR}

Where LVGDIR is the location of your LVG installation.

### Missing directories when using GITHUB release

If you are using the GITHUB release of CTB then you will need the a
directory for the output.

mkdir -p resources/public/output

### Start up system

In the top-level ctb directory run:

java -jar target/ctb-0.1.3-SNAPSHOT-standalone.jar [port]

Note: When using the GITHUB release, the name and path the standalone
jar will vary based on version in the project.clj file and the version
of Leiningen used.

or if you have Leiningen:

lein ring server [port]

Then point your web browser to localhost:3000 (or if you supplied a
port number, that port number.)

### Supply Term List

Paste your term list into the "Input Terms" (first) page and press
"Submit".

### Filter synonyms

Select or de-select terms in Synonym Set View to filter the synonyms
generated by the tool and press "Submit".

### Generate Data Set

The generated dataset will be placed in the directory
resources/public/output/user//.

The directory should contain the following files:

filtered-synset
filtered-termlist.edn
mrconso.rrf
mrsty.rrf
params
synonyms.checksum
termlist

## For Users of the Github release

You will need both [Leiningen](https://leiningen.org/) and
[Maven](https://maven.apache.org/) to be installed.

Irutils 2.1 inverted file library is necessary to use the latest
version of CTB. In separate directory clone, compile and install
irutils version 2.1 into your local maven (and leiningen) repository:

$ git clone https://github.com/willjrogers/irutils.git
$ cd irutils/java
$ git branch rel2.1 rel-2.1
$ git checkout rel2.1
$ mkdir -p src/main
$ (cd src/main && ln -s ../../sources java)
$ mvn install

Goto The "ctb" directory and compile and package CTB:

$ cd ctb
$ lein uberjar

If the uberjar builds successfully, the steps in the usage section
above should work normally.

## For Developers

### Running the system in Apache Tomcat

If you have tomcat you can use the file
target/ctb-0.1.0-SNAPSHOT-standalone.war to deploy the system to
tomcat.

The application now expects the config directory containing
ctb.properties and the data directory containing the indexes to be in
sub-directory war-resources before deployment using the command: `lein
ring uberwar`.

Note: CTB has not been extensively tested in Tomcat and may require
modification to work properly.

## License

CTB is product of the U.S. Government and is not subject to copyright.

For more information see:
http://www.usa.gov/government-works