Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ankane/mitie-ruby
Named-entity recognition for Ruby
https://github.com/ankane/mitie-ruby
named-entity-recognition ner
Last synced: 13 days ago
JSON representation
Named-entity recognition for Ruby
- Host: GitHub
- URL: https://github.com/ankane/mitie-ruby
- Owner: ankane
- License: bsl-1.0
- Created: 2020-09-14T20:03:01.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2024-07-27T04:33:23.000Z (3 months ago)
- Last Synced: 2024-10-04T10:54:40.017Z (about 1 month ago)
- Topics: named-entity-recognition, ner
- Language: Ruby
- Homepage:
- Size: 108 KB
- Stars: 135
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# MITIE Ruby
[MITIE](https://github.com/mit-nlp/MITIE) - named-entity recognition, binary relation detection, and text categorization - for Ruby
- Finds people, organizations, and locations in text
- Detects relationships between entities, like `PERSON` was born in `LOCATION`[![Build Status](https://github.com/ankane/mitie-ruby/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/mitie-ruby/actions)
## Installation
Add this line to your application’s Gemfile:
```ruby
gem "mitie"
```And download the pre-trained models for your language:
- [English](https://github.com/mit-nlp/MITIE/releases/download/v0.4/MITIE-models-v0.2.tar.bz2)
- [Spanish](https://github.com/mit-nlp/MITIE/releases/download/v0.4/MITIE-models-v0.2-Spanish.zip)
- [German](https://github.com/mit-nlp/MITIE/releases/download/v0.4/MITIE-models-v0.2-German.tar.bz2)## Getting Started
- [Named Entity Recognition](#named-entity-recognition)
- [Binary Relation Detection](#binary-relation-detection)
- [Text Categorization](#text-categorization)## Named Entity Recognition
Load an NER model
```ruby
model = Mitie::NER.new("ner_model.dat")
```Create a document
```ruby
doc = model.doc("Nat works at GitHub in San Francisco")
```Get entities
```ruby
doc.entities
```This returns
```ruby
[
{text: "Nat", tag: "PERSON", score: 0.3112371212688382, offset: 0},
{text: "GitHub", tag: "ORGANIZATION", score: 0.5660115198329334, offset: 13},
{text: "San Francisco", tag: "LOCATION", score: 1.3890524313885309, offset: 23}
]
```Get tokens
```ruby
doc.tokens
```Get tokens and their offset
```ruby
doc.tokens_with_offset
```Get all tags for a model
```ruby
model.tags
```### Training
Load an NER model into a trainer
```ruby
trainer = Mitie::NERTrainer.new("total_word_feature_extractor.dat")
```Create training instances
```ruby
tokens = ["You", "can", "do", "machine", "learning", "in", "Ruby", "!"]
instance = Mitie::NERTrainingInstance.new(tokens)
instance.add_entity(3..4, "topic") # machine learning
instance.add_entity(6..6, "language") # Ruby
```Add the training instances to the trainer
```ruby
trainer.add(instance)
```Train the model
```ruby
model = trainer.train
```Save the model
```ruby
model.save_to_disk("ner_model.dat")
```## Binary Relation Detection
Detect relationships betweens two entities, like:
- `PERSON` was born in `LOCATION`
- `ORGANIZATION` was founded in `LOCATION`
- `FILM` was directed by `PERSON`There are 21 detectors for English. You can find them in the `binary_relations` directory in the model download.
Load a detector
```ruby
detector = Mitie::BinaryRelationDetector.new("rel_classifier_organization.organization.place_founded.svm")
```And create a document
```ruby
doc = model.doc("Shopify was founded in Ottawa")
```Get relations
```ruby
detector.relations(doc)
```This returns
```ruby
[{first: "Shopify", second: "Ottawa", score: 0.17649169745814464}]
```### Training
Load an NER model into a trainer
```ruby
trainer = Mitie::BinaryRelationTrainer.new(model)
```Add positive and negative examples to the trainer
```ruby
tokens = ["Shopify", "was", "founded", "in", "Ottawa"]
trainer.add_positive_binary_relation(tokens, 0..0, 4..4)
trainer.add_negative_binary_relation(tokens, 4..4, 0..0)
```Train the detector
```ruby
detector = trainer.train
```Save the detector
```ruby
detector.save_to_disk("binary_relation_detector.svm")
```## Text Categorization
Load a model into a trainer
```ruby
trainer = Mitie::TextCategorizerTrainer.new("total_word_feature_extractor.dat")
```Add labeled text to the trainer
```ruby
trainer.add("This is super cool", "positive")
```Train the model
```ruby
model = trainer.train
```Save the model
```ruby
model.save_to_disk("text_categorization_model.dat")
```Load a saved model
```ruby
model = Mitie::TextCategorizer.new("text_categorization_model.dat")
```Categorize text
```ruby
model.categorize("What a super nice day")
```## Deployment
Check out [Trove](https://github.com/ankane/trove) for deploying models.
```sh
trove push ner_model.dat
```## History
View the [changelog](https://github.com/ankane/mitie-ruby/blob/master/CHANGELOG.md)
## Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- [Report bugs](https://github.com/ankane/mitie-ruby/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/mitie-ruby/pulls)
- Write, clarify, or fix documentation
- Suggest or add new featuresTo get started with development:
```sh
git clone https://github.com/ankane/mitie-ruby.git
cd mitie-ruby
bundle install
bundle exec rake vendor:allexport MITIE_MODELS_PATH=path/to/MITIE-models/english
bundle exec rake test
```