https://github.com/andrefs/node-tnt-tagger
A statistical part-of-speech tagger
https://github.com/andrefs/node-tnt-tagger
Last synced: 11 months ago
JSON representation
A statistical part-of-speech tagger
- Host: GitHub
- URL: https://github.com/andrefs/node-tnt-tagger
- Owner: andrefs
- Created: 2019-04-23T23:38:35.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-05-31T12:16:24.000Z (about 6 years ago)
- Last Synced: 2025-02-26T08:15:26.423Z (over 1 year ago)
- Language: JavaScript
- Size: 26.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# tnt-tagger
A statistical part-of-speech tagger.
This is an implementation of Thorsten Brants' TnT parser. TnT, which
stands for Trigrams'n'Tags, is _"an efficient statistical
part-of-speech tagger"_, and its implementation is described in the
article [TnT -- A Statistical Part-of-Speech
Tagger](http://tagh.de/tom/wp-content/uploads/brants-2000.pdf).
In fact, **tnt-tagger** is a port of Python's [NLTK
implementation](https://www.nltk.org/_modules/nltk/tag/tnt.html) of
said parser.
This is currently a work in progress. Future work includes refactoring
code to make it more Javascript-like (for now, it feels a bit
artificial due to the direct translation from Python).
## Installation
```bash
$ npm install tnt-tagger
```
## Usage
```js
const TnT = require('./index');
const {Sentence,Token} = require('cetem-publico');
const ts = [new Sentence(1, [
new Token('Jersei', {pos: 'N' }) ,
new Token('atinge', {pos: 'V' }) ,
new Token('média', {pos: 'N' }) ,
new Token('de', {pos: 'PREP' }) ,
new Token('Cr$', {pos: 'CUR' }) ,
new Token('1,4', {pos: 'NUM' }) ,
new Token('milhão', {pos: 'N' }) ,
new Token('em', {pos: 'PREP|+' }) ,
new Token('a', {pos: 'ART' }) ,
new Token('venda', {pos: 'N' }) ,
new Token('de', {pos: 'PREP|+' }) ,
new Token('a', {pos: 'ART' }) ,
new Token('Pinhal', {pos: 'NPROP' }) ,
new Token('em', {pos: 'PREP' }) ,
new Token('São', {pos: 'NPROP' }) ,
new Token('Paulo', {pos: 'NPROP' })
])];
let corpus = {
sentences: function*(){
n = 1;
for(i=0; i t.tag(s))
.then(console.log);
```
## Methods
## TODO
## Acknowledgements
Thanks to Thorsten Brants for the original version of this algorithm,
and to NLTK's team for the implementation in which this module is
based on.
## Bugs and stuff
Open a GitHub issue or, preferably, send me a pull request.
## License
MIT