https://github.com/gsriram7/pos

Part of speech tagger using HMM and Viterbi algorithm
https://github.com/gsriram7/pos

hmm hmm-viterbi-algorithm part-of-speech-tagger viterbi-algorithm

Last synced: 4 months ago
JSON representation

Part of speech tagger using HMM and Viterbi algorithm

Host: GitHub
URL: https://github.com/gsriram7/pos
Owner: gsriram7
Created: 2018-03-06T04:15:11.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-03-06T04:40:11.000Z (over 7 years ago)
Last Synced: 2024-12-27T13:40:40.859Z (6 months ago)
Topics: hmm, hmm-viterbi-algorithm, part-of-speech-tagger, viterbi-algorithm
Language: Python
Homepage: http://ron.artstein.org/csci544-2018/coding-1.html
Size: 4.22 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Support: support_files/1.txt

Awesome Lists containing this project

README

# Part of Speech Tagger

The tagger uses Hidden Markov Model to encode the a language corpus with words tagged with corresponding tags.
Uses Viterbi algorithm to decode and tag sentences from test data.

The encoder is generic and it works for ***ANY*** language.

The encoder models the [corpus](en_train_tagged.txt) and writes the probabilities into [hmmmodel.txt](hmmmodel.txt)
The decoder consumes the model and tags the [test data](en_dev_raw.txt) and writes the output into [hmmoutput.txt](hmmoutput.txt)

## Accuracy for the model trained on given corpa

* English - 88.93%
* Chinese - 87.08%
* Hindi - 92.34%

These accuracies are obtained using a single generic encoder for 3 different languages.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gsriram7/pos

Awesome Lists containing this project

README