https://github.com/mbejda/node-opennlp

Apache OpenNLP wrapper for Nodejs
https://github.com/mbejda/node-opennlp

javascript nlp nlp-library node-opennlp opennlp punctuation sentence

Last synced: 10 months ago
JSON representation

Apache OpenNLP wrapper for Nodejs

Host: GitHub
URL: https://github.com/mbejda/node-opennlp
Owner: mbejda
License: mit
Created: 2015-02-09T15:50:20.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2019-04-12T13:40:01.000Z (over 7 years ago)
Last Synced: 2025-09-09T02:47:45.632Z (10 months ago)
Topics: javascript, nlp, nlp-library, node-opennlp, opennlp, punctuation, sentence
Language: JavaScript
Homepage: https://opennlp.apache.org/
Size: 19.7 MB
Stars: 56
Watchers: 5
Forks: 17
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          
NodeJs OpenNLP

===============

[![NPM](https://travis-ci.org/mbejda/Node-OpenNLP.svg?branch=master)](https://nodei.co/npm/opennlp/)

# Node OpenNLP - (OpenNLP 1.6.0)

## OpenNLP Wrapper For Node.js

Node-OpenNLP is depended on `Node-Java`. Please take make sure your environment is properly configured to run `Node-Java`.  Click [here](https://github.com/joeferner/node-java) to learn more about `Node-Java`.

### Installation

```

 npm install opennlp --save

```

Node-OpenNLP comes with **Apache OpenNLP 1.6.0** along with the following trained 1.5 series models:

 * en-chunker.bin

 * en-ner-person.bin

 * en-pos-maxent.bin

 * en-sent.bin

 * en-token.bin

More trained models can be found here:

http://opennlp.sourceforge.net/models-1.5

### Sentence Detector

The OpenNLP Sentence Detector can detect that a punctuation character marks the end of a sentence or not. In this sense a sentence is defined as the longest white space trimmed character sequence between two punctuation marks.

```Javascript

var openNLP = require("opennlp");

var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';

var sentenceDetector = new openNLP().sentenceDetector;

sentenceDetector.sentDetect(sentence, function(err, results) {

  /// To get probabilities

    sentenceDetector.probs(function(error,probability){

      console.log(error,probability)

    })

	console.log(results)

});

```

### Configurations 

The following default configurations can be overrided during initialization. 

```Javascript

var openNLP = require("opennlp");

var opennlp = new openNLP({

    models : {

        doccat:__dirname + '/models/en-doccat.bin',

        posTagger: __dirname + '/models/en-pos-maxent.bin',

        tokenizer: __dirname + '/models/en-token.bin',

        nameFinder: __dirname + '/models/en-ner-person.bin',

        sentenceDetector: __dirname + '/models/en-sent.bin',

        chunker: __dirname + '/models/en-chunker.bin'

    },

    openNLP = {

        jar: __dirname + "/lib/opennlp-tools-1.6.0.jar"

    }

});

```

### Tokenizer

The OpenNLP Tokenizers segment an input character sequence into tokens. Tokens are usually words, punctuation, numbers, etc.

```Javascript

var openNLP = require("opennlp");

var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';

var tokenizer = new openNLP().tokenizer;

tokenizer.tokenize(sentence, function(err, results) {

    console.log(err,results);

    tokenizer.getTokenProbabilities(function(error, response) {

            console.log(error,response);

    });

});

```

### Name Finder

The Name Finder can detect named entities and numbers in text. To be able to detect entities the Name Finder needs a model. The model is dependent on the language and entity type it was trained for.

```Javascript

var openNLP = require("opennlp");

var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';

var nameFinder = new openNLP().nameFinder;

nameFinder.find(sentence, function(err, tokens_arr) {

    console.log(err, tokens_arr)

    nameFinder.probs(function(error, response) {

        console.log(error, response)

    });

});

```

### Document Categorizer

The OpenNLP Document Categorizer can classify text into pre-defined categories. It is based on maximum entropy framework.

** To use the document categorizer you need to train a model first. The default trained model that is included is for testing purposes only. **

```Javascript

var openNLP = require("opennlp");

var doccat = new openNLP().doccat;

doccat.categorize("I enjoyed watching Rocky", function(err, list) {

    doccat.getAllResults(list, function(err, category) {

    });

    doccat.getBestCategory(list, function(err, category) {

    });

});

doccat.scoreMap("I enjoyed watching Rocky", function(err, category) {

});

doccat.sortedScoreMap("I enjoyed watching Rocky", function(err, category) {

});

doccat.getCategory(1, function(err, category) {

});

doccat.getIndex('Happy', function(err, index) {

});

```

### Part-of-Speech Tagger

The Part of Speech Tagger marks tokens with their corresponding word type based on the token itself and the context of the token. A token might have multiple pos tags depending on the token and the context. The OpenNLP POS Tagger uses a probability model to predict the correct pos tag out of the tag set.

``` Javascript

var openNLP = require("opennlp");

var posTagger = new openNLP().posTagger;

var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';

posTagger.tag(sentence, function(err, tokens_arr) {

    console.log(err, tokens_arr)

});

posTagger.topKSequences(sentence, function(error, tagger) {

    console.log(tagger.getScore())

    console.log(tagger.getProbs())

    console.log(tagger.getOutcomes())

});

```

### Chunker

Text chunking consists of dividing a text in syntactically correlated parts of words, like noun groups, verb groups, but does not specify their internal structure, nor their role in the main sentence.

```Javascript

var openNLP = require("opennlp");

var posTagger = new openNLP().posTagger;

var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';

var chunker = new openNLP().chunker;

posTagger.tag(sentence, function(err, tokens_arr) {

    chunker.topKSequences(sentence, tokens_arr, function(err, tokens_arr) {

      console.log(err, tokens_arr)

    });

    chunker.chunk(sentence, tokens_arr, function(err, tokens_arr) {

        chunker.probs(function(error, prob) {

        });

    });

});

```



Please report any bugs. Feel free to send me a tweet if you need any help.



Follow me on Twitter

[@notmilobejda](https://twitter.com/notmilobejda)


My Blog

[mbejda.com](https://mbejda.com)


[![Support via Gratipay](https://cdn.rawgit.com/gratipay/gratipay-badge/2.3.0/dist/gratipay.svg)](https://gratipay.com/mbejda/)

[![NPM](https://nodei.co/npm/opennlp.png)](https://nodei.co/npm/opennlp/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mbejda/node-opennlp

Awesome Lists containing this project

README