Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/malcolmgreaves/rex
REx: Relation Extraction. Modernized re-write of the code in the master's thesis: "Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic"
https://github.com/malcolmgreaves/rex
machine-learning natural-language-processing relation-extraction scala
Last synced: 14 days ago
JSON representation
REx: Relation Extraction. Modernized re-write of the code in the master's thesis: "Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic"
- Host: GitHub
- URL: https://github.com/malcolmgreaves/rex
- Owner: malcolmgreaves
- License: apache-2.0
- Created: 2015-02-25T19:41:46.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2018-03-07T02:35:45.000Z (almost 7 years ago)
- Last Synced: 2023-03-23T00:20:01.558Z (almost 2 years ago)
- Topics: machine-learning, natural-language-processing, relation-extraction, scala
- Language: Scala
- Size: 163 KB
- Stars: 23
- Watchers: 4
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.org/malcolmgreaves/rex.svg?branch=master)](https://travis-ci.org/malcolmgreaves/rex) [![Coverage Status](https://coveralls.io/repos/malcolmgreaves/rex/badge.svg)](https://coveralls.io/r/malcolmgreaves/rex)
# rex
REx: Relation Extraction. Modernized re-write of the code in the master's thesis:
"Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic"[The thesis is here.](http://reports-archive.adm.cs.cmu.edu/anon/2014/CMU-CS-14-128.pdf)
## Setup
This project uses `sbt` for build management. If you're unfamiliar with `sbt`, see the last section
for some pointers.##### Build
To download all dependencies and compile code, run `sbt compile`.##### Test
To run all tests, execute `sbt test`.Moreover, to see code coverage, first run `coverage`, then `test`. The coverage report will be
output as an HTML file.##### Command Line Applications
To produce bash scripts that will execute each individual command-line application within this
codebase, execute `sbt pack`.## Data
This project includes data that allows one to distantly supervise relation mentions in text.
The files are located under `data/`: a local `README` further explains the data content, format,
and purpose.These files are large and are stored using [`git-lfs`](https://git-lfs.github.com/). Be sure to
follow the appropriate instructions and ensure that you've set up this `git` plugin (i.e. have
performed `git lfs install` once).## Example
To evaluate relation extraction performance on the UIUC relation dataset using 3 fold cross-
validation, first build the executable scripts with `sbt pack` then execute:
```bash
./target/pack/bin/relation-extraction-learning-main \
learn_eval \
-li data/uiuc_cog_comp_group-entity_and_relation_recognition_corpora/all.corp \
--input_format uiuc \
-cg true \
--cost 1 \
--epsilon 0.003 \
--n_cv_folds 3
```Where:
- `learn_eval` is the command for the script
- `-li` specifies where the labeled relation data lives
- `--input_format` tells the program how to interpret the file at `-li` -- `uuic` means to use the
UUIC relation classification data format
- `-cg true` means that candidate generation is performed
- `--cost` indicates the cost-sensitive learning parameter for the SVM
- `--epsilon` controls the weight converage: stop when weight updates are less then this value
- `--n_cv_folds` indicates the number of folds to perform for cross-validationInvoking this program with the `--help` flag, or with no arguments, will output a detailed help
message to stdout.## License
Everything within this repository is copyright (2015-) by Malcolm Greaves.Use of this code is permitted according to the stipulations of the
[Apache 2](http://www.apache.org/licenses/LICENSE-2.0.txt) license.## How to use `sbt`
When using `sbt`, it is best to start it in the "interactive shell mode". To do this, simply
execute from the command line:
```bash
$ sbt
```After starting up (give it a few seconds), you can execute the following commands:
```
compile // compiles code
pack // creates executable scripts
test // runs tests
coverage / initializes the code-coverage system, use right before 'test'
reload // re-loads the sbt build definition, including plugin definitions
update // grabs all dependencies
```There are a _lot_ more commands for `sbt`. And a ton of community plugins that extend `sbt`'s
functionality.##### Tips
Not necessary! Just a few suggestions...
We recommend using the following configuration for sbt:
```bash
sbt -J-XX:MaxPermSize=768m -J-Xmx2g -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled
```
This gives some more memory to `sbt`, gives it a better default GC option, and enables a better class loading &
unloading module.Also, to limit the logging output of the Spark framework export this environment variable before
running tests:
```bash
export SPARK_CONF_DIR="/src/main/resources"
```