https://github.com/computablefacts/morta
Morta is a proof-of-concept Java implementation of a span categorizer.
https://github.com/computablefacts/morta
data-science java-library machine-learning text-classification
Last synced: 5 months ago
JSON representation
Morta is a proof-of-concept Java implementation of a span categorizer.
- Host: GitHub
- URL: https://github.com/computablefacts/morta
- Owner: computablefacts
- License: apache-2.0
- Archived: true
- Created: 2020-09-03T10:13:38.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-05-31T15:45:45.000Z (about 4 years ago)
- Last Synced: 2025-07-24T15:51:45.591Z (11 months ago)
- Topics: data-science, java-library, machine-learning, text-classification
- Language: Java
- Homepage:
- Size: 424 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Morta

[](https://travis-ci.com/computablefacts/morta)
[](https://codecov.io/gh/computablefacts/morta)
Morta is a proof-of-concept Java implementation of a span categorizer using many
ideas from [Snorkel](https://www.snorkel.org/).
## Usage
First, and unlike Snorkel, Morta automatically creates Labeling Functions from
user-provided gold labels (if need be, these functions are then automatically merged
with handcrafted Labeling Functions). Then, the Labeling Functions are used to train
a Generative Model. At last, a Discriminative Model is trained. The output of each
step is saved as an XML file.
### Creating Gold Labels
The format of a single Gold Label is :
```
{
"id": "",
"label": "",
"data": "",
"is_true_positive": ,
"is_true_negative": ,
"is_false_positive": ,
"is_false_negative":
}
```
The Gold Labels must be grouped together as a [ND-JSON](http://ndjson.org/) file :
```
{"id":"","label":"","data":"","is_true_positive":,"is_true_negative":,"is_false_positive":,"is_false_negative":}
{"id":"","label":"","data":"","is_true_positive":,"is_true_negative":,"is_false_positive":,"is_false_negative":}
{"id":"","label":"","data":"","is_true_positive":,"is_true_negative":,"is_false_positive":,"is_false_negative":}
...
```
The ND-JSON file must be gzipped.
### Training a span categorizer
To automatically train a new span categorizer from a set of Gold Labels,
run the following command-line:
```
java -Xms4g -Xmx8g com.computablefacts.morta.SaturatedDive \
-verbose true \
-facts "/home/user/2022-02-20_19-57-17/facts.prod.smacl.dab.json.gz" \
-documents "/home/user/2022-02-20_19-57-17/documents.prod.smacl.dab.json.gz" \
-output_directory "/home/user/2022-02-20_19-57-17"
```
Add `-label my_label` to train the span categorizer on `my_label` only.
## Adding Morta to your build
Morta's Maven group ID is `com.computablefacts` and its artifact ID is `morta`.
To add a dependency on Morta using Maven, use the following:
```xml
com.computablefacts
morta
1.x
```
## Snapshots
Snapshots of Morta built from the `master` branch are available through Sonatype
using the following dependency:
```xml
com.computablefacts
morta
1.x-SNAPSHOT
```
In order to be able to download snapshots from Sonatype add the following profile
to your project `pom.xml`:
```xml
allow-snapshots
true
snapshots-repo
https://s01.oss.sonatype.org/content/repositories/snapshots
false
true
```
## Publishing a new version
Deploy a release to Maven Central with these commands:
```bash
$ git tag
$ git push origin
```
To update and publish the next SNAPSHOT version, just change and push the version:
```bash
$ mvn versions:set -DnewVersion=-SNAPSHOT
$ git commit -am "Update to version -SNAPSHOT"
$ git push origin master
```