An open API service indexing awesome lists of open source software.

https://github.com/computablefacts/morta

Morta is a proof-of-concept Java implementation of a span categorizer.
https://github.com/computablefacts/morta

data-science java-library machine-learning text-classification

Last synced: 5 months ago
JSON representation

Morta is a proof-of-concept Java implementation of a span categorizer.

Awesome Lists containing this project

README

          

# Morta

![Maven Central](https://img.shields.io/maven-central/v/com.computablefacts/morta)
[![Build Status](https://travis-ci.com/computablefacts/morta.svg?branch=master)](https://travis-ci.com/computablefacts/morta)
[![codecov](https://codecov.io/gh/computablefacts/morta/branch/master/graph/badge.svg)](https://codecov.io/gh/computablefacts/morta)

Morta is a proof-of-concept Java implementation of a span categorizer using many
ideas from [Snorkel](https://www.snorkel.org/).

## Usage

First, and unlike Snorkel, Morta automatically creates Labeling Functions from
user-provided gold labels (if need be, these functions are then automatically merged
with handcrafted Labeling Functions). Then, the Labeling Functions are used to train
a Generative Model. At last, a Discriminative Model is trained. The output of each
step is saved as an XML file.

### Creating Gold Labels

The format of a single Gold Label is :

```
{
"id": "",
"label": "",
"data": "",
"is_true_positive": ,
"is_true_negative": ,
"is_false_positive": ,
"is_false_negative":
}
```

The Gold Labels must be grouped together as a [ND-JSON](http://ndjson.org/) file :

```
{"id":"","label":"","data":"","is_true_positive":,"is_true_negative":,"is_false_positive":,"is_false_negative":}
{"id":"","label":"","data":"","is_true_positive":,"is_true_negative":,"is_false_positive":,"is_false_negative":}
{"id":"","label":"","data":"","is_true_positive":,"is_true_negative":,"is_false_positive":,"is_false_negative":}
...
```

The ND-JSON file must be gzipped.

### Training a span categorizer

To automatically train a new span categorizer from a set of Gold Labels,
run the following command-line:

```
java -Xms4g -Xmx8g com.computablefacts.morta.SaturatedDive \
-verbose true \
-facts "/home/user/2022-02-20_19-57-17/facts.prod.smacl.dab.json.gz" \
-documents "/home/user/2022-02-20_19-57-17/documents.prod.smacl.dab.json.gz" \
-output_directory "/home/user/2022-02-20_19-57-17"
```

Add `-label my_label` to train the span categorizer on `my_label` only.

## Adding Morta to your build

Morta's Maven group ID is `com.computablefacts` and its artifact ID is `morta`.

To add a dependency on Morta using Maven, use the following:

```xml

com.computablefacts
morta
1.x

```

## Snapshots

Snapshots of Morta built from the `master` branch are available through Sonatype
using the following dependency:

```xml

com.computablefacts
morta
1.x-SNAPSHOT

```

In order to be able to download snapshots from Sonatype add the following profile
to your project `pom.xml`:

```xml


allow-snapshots
true


snapshots-repo
https://s01.oss.sonatype.org/content/repositories/snapshots
false
true


```

## Publishing a new version

Deploy a release to Maven Central with these commands:

```bash
$ git tag
$ git push origin
```

To update and publish the next SNAPSHOT version, just change and push the version:

```bash
$ mvn versions:set -DnewVersion=-SNAPSHOT
$ git commit -am "Update to version -SNAPSHOT"
$ git push origin master
```