https://github.com/ojj11/pos-tagger.js

This is a rewrite of the Stanford Part-Of-Speech Log-Linear tagger in Kotlin, it is compiled to JavaScript and made available through npm. No background Java service is needed.
https://github.com/ojj11/pos-tagger.js

kotlin-js kotlin-multiplatform natural-language-processing part-of-speech part-of-speech-tagger

Last synced: 8 months ago
JSON representation

This is a rewrite of the Stanford Part-Of-Speech Log-Linear tagger in Kotlin, it is compiled to JavaScript and made available through npm. No background Java service is needed.

Host: GitHub
URL: https://github.com/ojj11/pos-tagger.js
Owner: ojj11
License: gpl-2.0
Created: 2020-06-28T01:06:45.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2023-10-27T15:07:24.000Z (about 2 years ago)
Last Synced: 2024-10-18T07:53:16.302Z (about 1 year ago)
Topics: kotlin-js, kotlin-multiplatform, natural-language-processing, part-of-speech, part-of-speech-tagger
Language: Kotlin
Homepage:
Size: 33.7 MB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # pos-tagger.js

![Build and test the library](https://github.com/ojj11/pos-tagger.js/workflows/Build%20and%20test%20the%20library/badge.svg?event=push)

> A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.

This is a rewrite of the [Stanford Part-Of-Speech Log-Linear tagger](https://nlp.stanford.edu/software/tagger.shtml) in Kotlin, it is compiled to JavaScript and made available through npm. No background Java service is needed.

This module includes two models:

 - left3words-wsj-0-18

 - bidirectional-distsim-wsj-0-18

The total package size (including both models) is under 10mb. Basic benchmarks show that the JavaScript library has similar performance to that of the original Java code for the "left3words" model.

    License: GPL v2 or above

## Usage

Install pos-tagger.js from [npm](https://www.npmjs.com/pos-tagger.js):

> npm install pos-tagger.js

###### Example code:

```javascript

const Tagger = require("pos-tagger.js");

const tagger = new Tagger(Tagger.readModelSync("left3words-wsj-0-18"));

// alternatively

// const tagger = new Tagger(Tagger.readModelSync("bidirectional-distsim-wsj-0-18"));

const output = tagger.tag("I am a happy part-of-speech tagger. How do you do?");

console.log(output);

console.log("First word is a " + output[0][0].tag);

```

`tag` takes a string representing one or more "." terminated sentences. The output is a list with one element per input sentence. Each sentence element is itself a list with an element per input token, each element contains a `word` key with the original token, and a `tag` key which contains the [Penn Treebank part-of-speech tag](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html).

###### Example output:

```json

[

  [

    { "word": "I", "tag": "PRP" },

    { "word": "am", "tag": "VBP" },

    { "word": "a", "tag": "DT" },

    { "word": "happy", "tag": "JJ" },

    { "word": "part-of-speech", "tag": "JJ" },

    { "word": "tagger", "tag": "NN" },

    { "word": ".", "tag": "." }

  ],

  [

    { "word": "How", "tag": "WRB" },

    { "word": "do", "tag": "VBP" },

    { "word": "you", "tag": "PRP" },

    { "word": "do", "tag": "VB" },

    { "word": "?", "tag": "."

    }

  ]

]

First word is a PRP

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ojj11/pos-tagger.js

Awesome Lists containing this project

README