{"id":24873077,"url":"https://github.com/malcolmgreaves/old_rex","last_synced_at":"2025-10-15T16:30:52.696Z","repository":{"id":27842158,"uuid":"31332291","full_name":"malcolmgreaves/old_rex","owner":"malcolmgreaves","description":"REx: Relation Extraction. Modernized re-write of the code in the master's thesis: \"Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic\"","archived":false,"fork":false,"pushed_at":"2018-03-07T02:35:45.000Z","size":167,"stargazers_count":22,"open_issues_count":2,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-01-25T05:01:59.163Z","etag":null,"topics":["machine-learning","natural-language-processing","relation-extraction","scala"],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/malcolmgreaves.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-02-25T19:41:46.000Z","updated_at":"2025-01-23T00:54:16.000Z","dependencies_parsed_at":"2022-09-03T13:25:00.627Z","dependency_job_id":null,"html_url":"https://github.com/malcolmgreaves/old_rex","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malcolmgreaves%2Fold_rex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malcolmgreaves%2Fold_rex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malcolmgreaves%2Fold_rex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malcolmgreaves%2Fold_rex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/malcolmgreaves","download_url":"https://codeload.github.com/malcolmgreaves/old_rex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236624388,"owners_count":19178981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","natural-language-processing","relation-extraction","scala"],"created_at":"2025-02-01T05:27:28.465Z","updated_at":"2025-10-15T16:30:47.397Z","avatar_url":"https://github.com/malcolmgreaves.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/malcolmgreaves/rex.svg?branch=master)](https://travis-ci.org/malcolmgreaves/rex) [![Coverage Status](https://coveralls.io/repos/malcolmgreaves/rex/badge.svg)](https://coveralls.io/r/malcolmgreaves/rex)\n# rex\nREx: Relation Extraction. Modernized re-write of the code in the master's thesis: \n\"Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic\"\n\n[The thesis is  here.](http://reports-archive.adm.cs.cmu.edu/anon/2014/CMU-CS-14-128.pdf)\n\n\n## Setup\n\nThis project uses `sbt` for build management. If you're unfamiliar with `sbt`, see the last section\nfor some pointers.\n\n##### Build\nTo download all dependencies and compile code, run `sbt compile`.\n\n##### Test\nTo run all tests, execute `sbt test`.\n\nMoreover, to see code coverage, first run `coverage`, then `test`. The coverage report will be \noutput as an HTML file.\n\n##### Command Line Applications\nTo produce bash scripts that will execute each individual command-line application within this\ncodebase,  execute `sbt pack`.\n\n\n## Data\n\nThis project includes data that allows one to distantly supervise relation mentions in text. \nThe files are located under `data/`: a local `README` further explains the data content, format,\nand purpose.  \n\nThese files are large and are stored using [`git-lfs`](https://git-lfs.github.com/). Be sure to \nfollow the appropriate instructions and ensure that you've set up this `git` plugin (i.e. have \nperformed `git lfs install` once).\n\n\n## Example\nTo evaluate relation extraction performance on the UIUC relation dataset using 3 fold cross-\nvalidation, first build the executable scripts with `sbt pack` then execute:\n```bash\n./target/pack/bin/relation-extraction-learning-main \\\nlearn_eval \\\n-li data/uiuc_cog_comp_group-entity_and_relation_recognition_corpora/all.corp \\\n--input_format uiuc \\\n-cg true \\\n--cost 1 \\\n--epsilon 0.003 \\ \n--n_cv_folds 3\n```\n\nWhere:\n- `learn_eval` is the command for the script\n- `-li` specifies where the labeled relation data lives\n- `--input_format` tells the program how to interpret the file at `-li` -- `uuic` means to use the\nUUIC relation classification data format\n- `-cg true` means that candidate generation is performed\n- `--cost` indicates the cost-sensitive learning parameter for the SVM\n- `--epsilon` controls the weight converage: stop when weight updates are less then this value\n- `--n_cv_folds` indicates the number of folds to perform for cross-validation\n\nInvoking this program with the `--help` flag, or with no arguments, will output a detailed help \nmessage to stdout.\n\n\n## License\nEverything within this repository is copyright (2015-) by Malcolm Greaves.\n\nUse of this code is permitted according to the stipulations of the \n[Apache 2](http://www.apache.org/licenses/LICENSE-2.0.txt) license.\n\n\n## How to use `sbt`\nWhen using `sbt`, it is best to start it in the \"interactive shell mode\". To do this, simply\nexecute from the command line:\n```bash\n$ sbt\n```\n\nAfter starting up (give it a few seconds), you can execute the following commands:\n```\ncompile // compiles code\npack // creates executable scripts\ntest // runs tests\ncoverage / initializes the code-coverage system, use right before 'test'\nreload // re-loads the sbt build definition, including plugin definitions\nupdate // grabs all dependencies\n```\n\nThere are a _lot_ more commands for `sbt`. And a ton of community plugins that extend `sbt`'s \nfunctionality.\n\n##### Tips\n\nNot necessary! Just a few suggestions...\n\nWe recommend using the following configuration for sbt:\n```bash\nsbt -J-XX:MaxPermSize=768m -J-Xmx2g -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled\n```\nThis gives some more memory to `sbt`, gives it a better default GC option, and enables a better class loading \u0026 \nunloading module.\n\nAlso, to limit the logging output of the Spark framework export this environment variable before \nrunning tests:\n```bash\nexport SPARK_CONF_DIR=\"\u003cYOUR_PATH_TO_THIS_REPO\u003e/src/main/resources\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmalcolmgreaves%2Fold_rex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmalcolmgreaves%2Fold_rex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmalcolmgreaves%2Fold_rex/lists"}