Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ojj11/pos-tagger.js
This is a rewrite of the Stanford Part-Of-Speech Log-Linear tagger in Kotlin, it is compiled to JavaScript and made available through npm. No background Java service is needed.
https://github.com/ojj11/pos-tagger.js
kotlin-js kotlin-multiplatform natural-language-processing part-of-speech part-of-speech-tagger
Last synced: about 2 months ago
JSON representation
This is a rewrite of the Stanford Part-Of-Speech Log-Linear tagger in Kotlin, it is compiled to JavaScript and made available through npm. No background Java service is needed.
- Host: GitHub
- URL: https://github.com/ojj11/pos-tagger.js
- Owner: ojj11
- License: gpl-2.0
- Created: 2020-06-28T01:06:45.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-10-27T15:07:24.000Z (over 1 year ago)
- Last Synced: 2024-10-18T07:53:16.302Z (4 months ago)
- Topics: kotlin-js, kotlin-multiplatform, natural-language-processing, part-of-speech, part-of-speech-tagger
- Language: Kotlin
- Homepage:
- Size: 33.7 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pos-tagger.js
![Build and test the library](https://github.com/ojj11/pos-tagger.js/workflows/Build%20and%20test%20the%20library/badge.svg?event=push)
> A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.
This is a rewrite of the [Stanford Part-Of-Speech Log-Linear tagger](https://nlp.stanford.edu/software/tagger.shtml) in Kotlin, it is compiled to JavaScript and made available through npm. No background Java service is needed.
This module includes two models:
- left3words-wsj-0-18
- bidirectional-distsim-wsj-0-18The total package size (including both models) is under 10mb. Basic benchmarks show that the JavaScript library has similar performance to that of the original Java code for the "left3words" model.
License: GPL v2 or above
## Usage
Install pos-tagger.js from [npm](https://www.npmjs.com/pos-tagger.js):
> npm install pos-tagger.js
###### Example code:
```javascript
const Tagger = require("pos-tagger.js");
const tagger = new Tagger(Tagger.readModelSync("left3words-wsj-0-18"));// alternatively
// const tagger = new Tagger(Tagger.readModelSync("bidirectional-distsim-wsj-0-18"));const output = tagger.tag("I am a happy part-of-speech tagger. How do you do?");
console.log(output);
console.log("First word is a " + output[0][0].tag);
````tag` takes a string representing one or more "." terminated sentences. The output is a list with one element per input sentence. Each sentence element is itself a list with an element per input token, each element contains a `word` key with the original token, and a `tag` key which contains the [Penn Treebank part-of-speech tag](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html).
###### Example output:
```json
[
[
{ "word": "I", "tag": "PRP" },
{ "word": "am", "tag": "VBP" },
{ "word": "a", "tag": "DT" },
{ "word": "happy", "tag": "JJ" },
{ "word": "part-of-speech", "tag": "JJ" },
{ "word": "tagger", "tag": "NN" },
{ "word": ".", "tag": "." }
],
[
{ "word": "How", "tag": "WRB" },
{ "word": "do", "tag": "VBP" },
{ "word": "you", "tag": "PRP" },
{ "word": "do", "tag": "VB" },
{ "word": "?", "tag": "."
}
]
]
First word is a PRP
```