https://github.com/lhncbc/ctb
Custom Taxonomy Builder
https://github.com/lhncbc/ctb
clojure lvg taxonomy-construction umls
Last synced: 10 months ago
JSON representation
Custom Taxonomy Builder
- Host: GitHub
- URL: https://github.com/lhncbc/ctb
- Owner: LHNCBC
- License: other
- Created: 2019-01-02T17:40:59.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-12-10T00:19:40.000Z (about 3 years ago)
- Last Synced: 2025-01-28T04:30:24.090Z (about 1 year ago)
- Topics: clojure, lvg, taxonomy-construction, umls
- Language: Clojure
- Homepage:
- Size: 246 KB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CTB - Custom Taxonomy Builder
## Description
Given a list of terms and a set of UMLS files, the CTB generates a
subset the of UMLS containing the supplied terms and their word-based
variants.
## Inputs
The following files should be placed in the data/input directory:
+ MRCONSO.RRF concepts file
+ MRSTY.RRF concept -> semantic types file
Supplied to Web Interface
+ list of supplied terms
## Outputs
+ Custom version of mrconso.rrf
+ Custom version of mrsty.rrf
## Usage
To use CTB you must first create indexes of your UMLS files and then
start the tool.
### Prepare Knowledge Sources
Copy MRCONSO.RRF, MRSTY.RRF to ctb/data/input/*your data set name*/.
In the ctb directory run:
bin/prepumls.sh 'your data set name'
For example:
bin/prepumls.sh 2016AA
Note: When using the GITHUB release, the name and path the standalone
jar will vary based on version in the project.clj file and the version
of Leiningen used, the CLASSPATH variable in the script
bin/prepumls.sh must be modified to match the current location of the
standalone jar (or uberjar).
### Update the system configuration file
There should be a file called ctb.properties in the `config`
directory. In ctb.properties change:
ctb.ivf.dataroot: ...
to:
ctb.ivf.dataroot: data/ivf/
### Adding LVG to configuration file for term expansion
If you want to use the Lexical Tools Lexical Variant Generator (LVG)
to supply term combinations not found in the UMLS then download LVG
from the Lexical Systems Group website
(https://lsg3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/web/index.html)
and install it according to its directions. After installing the
Lexical Tools then add the following to the ctb.properties file:
ctb.lvg.directory: {LVGDIR}
Where LVGDIR is the location of your LVG installation.
### Missing directories when using GITHUB release
If you are using the GITHUB release of CTB then you will need the a
directory for the output.
mkdir -p resources/public/output
### Start up system
In the top-level ctb directory run:
java -jar target/ctb-0.1.3-SNAPSHOT-standalone.jar [port]
Note: When using the GITHUB release, the name and path the standalone
jar will vary based on version in the project.clj file and the version
of Leiningen used.
or if you have Leiningen:
lein ring server [port]
Then point your web browser to localhost:3000 (or if you supplied a
port number, that port number.)
### Supply Term List
Paste your term list into the "Input Terms" (first) page and press
"Submit".
### Filter synonyms
Select or de-select terms in Synonym Set View to filter the synonyms
generated by the tool and press "Submit".
### Generate Data Set
The generated dataset will be placed in the directory
resources/public/output/user//.
The directory should contain the following files:
filtered-synset
filtered-termlist.edn
mrconso.rrf
mrsty.rrf
params
synonyms.checksum
termlist
## For Users of the Github release
You will need both [Leiningen](https://leiningen.org/) and
[Maven](https://maven.apache.org/) to be installed.
Irutils 2.1 inverted file library is necessary to use the latest
version of CTB. In separate directory clone, compile and install
irutils version 2.1 into your local maven (and leiningen) repository:
$ git clone https://github.com/willjrogers/irutils.git
$ cd irutils/java
$ git branch rel2.1 rel-2.1
$ git checkout rel2.1
$ mkdir -p src/main
$ (cd src/main && ln -s ../../sources java)
$ mvn install
Goto The "ctb" directory and compile and package CTB:
$ cd ctb
$ lein uberjar
If the uberjar builds successfully, the steps in the usage section
above should work normally.
## For Developers
### Running the system in Apache Tomcat
If you have tomcat you can use the file
target/ctb-0.1.0-SNAPSHOT-standalone.war to deploy the system to
tomcat.
The application now expects the config directory containing
ctb.properties and the data directory containing the indexes to be in
sub-directory war-resources before deployment using the command: `lein
ring uberwar`.
Note: CTB has not been extensively tested in Tomcat and may require
modification to work properly.
## License
CTB is product of the U.S. Government and is not subject to copyright.
For more information see:
http://www.usa.gov/government-works