Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hscells/sysrev-screening-prioritisation
Systematic Review Screening Prioritisation
https://github.com/hscells/sysrev-screening-prioritisation
Last synced: 6 days ago
JSON representation
Systematic Review Screening Prioritisation
- Host: GitHub
- URL: https://github.com/hscells/sysrev-screening-prioritisation
- Owner: hscells
- Created: 2017-04-19T03:02:18.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-05-17T05:44:21.000Z (over 7 years ago)
- Last Synced: 2024-11-07T13:57:45.110Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 215 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Systematic Review Screening Prioritisation
_Using learning to rank._
## Setup
- elasticsearch version 5.3.0
- latest java version 1.8
- trec_evalThe experiments use the learning to rank elasticsearch plugin from:
https://github.com/o19s/elasticsearch-learning-to-rankTo build the plugin, ensure you have the `JAVA_HOME` environment variable set, and have at least java version 1.8u40.
On macOS, the to set `JAVA_HOME`, use `export JAVA_HOME=$(/usr/libexec/java_home)`.Inside the `ltr-query` submodule, run:
```bash
./gradlew run#installLtrQueryPlugin
```This will generate a zip file that can be installed to elasticsearch as a module. To install into elasticsearch, run:
```bash
elasticsearch-plugin install file:///$(pwd)/build/distributions/ltr-query-0.1.1-es5.3.0.zip
```I have included a convenience scripts called `installPlugin.sh` in this directory that will go ahead and run these
commands for you.### Elasticsearch configuration
Because the queries and models we are dealing with are very large, we need to ensure elasticsearch can handle them. My
elasticsearch.yml configuration file has these additional options:```yaml
script.max_size_in_bytes: 10000000
http.max_content_length: 1gb
```I also start elasticsearch with more heap space like so:
```bash
ES_JAVA_OPTS="-Xms6g -Xmx6g" ./bin/elasticsearch
```## Training A Model
To train a model, this project uses RankLib. This project contains a pipeline that will perform feature extraction,
model training, re-ranking and evaluation. To run the pipeline, use:
```bash
./trainltr
```The top of the file contains variables that may be configured to change, for instance, elasticsearch settings.
Additionally, the pipeline comprises the following python scripts:- `ltrfeatures.py`: automatically extract features and produce RankLib training data.
- `uploadscript.py`: facilitate the uploading of RankLib models into the ltr elasticsearch plugin
- `search.py`: compare the baseline similarity scores against the learn to rank similarity function
## FeaturesFeatures are constructed as subclasses of `AbstractFeature` in the `features` module. See the
[features readme](features/README.md) to explore how to extend and modify the features.