https://github.com/gsriram7/pos
Part of speech tagger using HMM and Viterbi algorithm
https://github.com/gsriram7/pos
hmm hmm-viterbi-algorithm part-of-speech-tagger viterbi-algorithm
Last synced: 4 months ago
JSON representation
Part of speech tagger using HMM and Viterbi algorithm
- Host: GitHub
- URL: https://github.com/gsriram7/pos
- Owner: gsriram7
- Created: 2018-03-06T04:15:11.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-03-06T04:40:11.000Z (over 7 years ago)
- Last Synced: 2024-12-27T13:40:40.859Z (6 months ago)
- Topics: hmm, hmm-viterbi-algorithm, part-of-speech-tagger, viterbi-algorithm
- Language: Python
- Homepage: http://ron.artstein.org/csci544-2018/coding-1.html
- Size: 4.22 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Support: support_files/1.txt
Awesome Lists containing this project
README
# Part of Speech Tagger
The tagger uses Hidden Markov Model to encode the a language corpus with words tagged with corresponding tags.
Uses Viterbi algorithm to decode and tag sentences from test data.The encoder is generic and it works for ***ANY*** language.
The encoder models the [corpus](en_train_tagged.txt) and writes the probabilities into [hmmmodel.txt](hmmmodel.txt)
The decoder consumes the model and tags the [test data](en_dev_raw.txt) and writes the output into [hmmoutput.txt](hmmoutput.txt)## Accuracy for the model trained on given corpa
* English - 88.93%
* Chinese - 87.08%
* Hindi - 92.34%
These accuracies are obtained using a single generic encoder for 3 different languages.