https://github.com/andrefs/node-tnt-tagger

A statistical part-of-speech tagger
https://github.com/andrefs/node-tnt-tagger

Last synced: 11 months ago
JSON representation

A statistical part-of-speech tagger

Host: GitHub
URL: https://github.com/andrefs/node-tnt-tagger
Owner: andrefs
Created: 2019-04-23T23:38:35.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2020-05-31T12:16:24.000Z (about 6 years ago)
Last Synced: 2025-02-26T08:15:26.423Z (over 1 year ago)
Language: JavaScript
Size: 26.4 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          
# tnt-tagger

A statistical part-of-speech tagger.

This is an implementation of Thorsten Brants' TnT parser. TnT, which

stands for Trigrams'n'Tags, is _"an efficient statistical

part-of-speech tagger"_, and its implementation is described in the

article [TnT -- A Statistical Part-of-Speech

Tagger](http://tagh.de/tom/wp-content/uploads/brants-2000.pdf).

In fact, **tnt-tagger** is a port of Python's [NLTK

implementation](https://www.nltk.org/_modules/nltk/tag/tnt.html) of

said parser.

This is currently a work in progress. Future work includes refactoring

code to make it more Javascript-like (for now, it feels a bit

artificial due to the direct translation from Python).

## Installation

```bash

$ npm install tnt-tagger

```

## Usage

```js

const TnT = require('./index');

const {Sentence,Token} = require('cetem-publico');

const ts = [new Sentence(1, [

    new Token('Jersei', {pos: 'N'      }) ,

    new Token('atinge', {pos: 'V'      }) ,

    new Token('média',  {pos: 'N'      }) ,

    new Token('de',     {pos: 'PREP'   }) ,

    new Token('Cr$',    {pos: 'CUR'    }) ,

    new Token('1,4',    {pos: 'NUM'    }) ,

    new Token('milhão', {pos: 'N'      }) ,

    new Token('em',     {pos: 'PREP|+' }) ,

    new Token('a',      {pos: 'ART'    }) ,

    new Token('venda',  {pos: 'N'      }) ,

    new Token('de',     {pos: 'PREP|+' }) ,

    new Token('a',      {pos: 'ART'    }) ,

    new Token('Pinhal', {pos: 'NPROP'  }) ,

    new Token('em',     {pos: 'PREP'   }) ,

    new Token('São',    {pos: 'NPROP'  }) ,

    new Token('Paulo',  {pos: 'NPROP'  })

  ])];

let corpus = {

  sentences:  function*(){

    n = 1;

    for(i=0; i t.tag(s))

  .then(console.log);

```

## Methods

## TODO

## Acknowledgements

Thanks to Thorsten Brants for the original version of this algorithm,

and to NLTK's team for the implementation in which this module is

based on.

## Bugs and stuff

Open a GitHub issue or, preferably, send me a pull request.

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andrefs/node-tnt-tagger

Awesome Lists containing this project

README