https://github.com/mgproduction/mgtagger

A small, generic, single-C-source-code POS tagger, featuring ngrams with most common word spice, with Viterbi-like code.
https://github.com/mgproduction/mgtagger

c part-of-speech-tagger postagger tagger

Last synced: 9 days ago
JSON representation

A small, generic, single-C-source-code POS tagger, featuring ngrams with most common word spice, with Viterbi-like code.

Host: GitHub
URL: https://github.com/mgproduction/mgtagger
Owner: MGProduction
License: apache-2.0
Created: 2017-11-10T19:41:27.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-11-11T17:24:14.000Z (over 8 years ago)
Last Synced: 2025-02-25T11:46:47.058Z (over 1 year ago)
Topics: c, part-of-speech-tagger, postagger, tagger
Language: C
Homepage:
Size: 7.68 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

mgtagger
=====

*mgtagger* is small, generic, single-C-source-code POS tagger, featuring ngrams with most common word spice, with Viterbi-like code.
It can learn languages from conllu files or from in-line-tagging ones.

The source code in this repository is provided under the terms of the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0.html).

## Information

*mgtagger* is able to learn the info needed to postag from inline pos tagged file (the/DT cat/NN is/VBZ on/IN the/DT table/NN) or from conllu files (in which case you can select which feature set to use, and you'll also get base forms in output).
After the quick learning phase it generates (and it's able to load) a (text) .mg file - lex + ngrams.

It natively works in *utf8* - but you can switch it to codepage (changing this setting into the code)

To use it you in your project you simply need to add to your project *mgtagger_postag.c* + *mgtagger_private.h* / *mgtagger.h*

*mgtagger* at the moment doesn't do tokenization (even if it's a built-in basic tokenizer that may fit for some languages - not surely
for Japanese, Chinese or Thai, anyway) - it just assign a POS to tokens after its analysis.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mgproduction/mgtagger

Awesome Lists containing this project

README