Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eljandoubi/part-of-speech-tagging
Hidden Markov model for part of speech tagging
https://github.com/eljandoubi/part-of-speech-tagging
hidden-markov-models machine-learning natural-language-processing nlp part-of-speech-tagging pomegranate
Last synced: about 2 months ago
JSON representation
Hidden Markov model for part of speech tagging
- Host: GitHub
- URL: https://github.com/eljandoubi/part-of-speech-tagging
- Owner: eljandoubi
- License: other
- Created: 2022-10-05T17:39:10.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-10-05T17:57:53.000Z (over 2 years ago)
- Last Synced: 2024-04-17T15:09:15.775Z (9 months ago)
- Topics: hidden-markov-models, machine-learning, natural-language-processing, nlp, part-of-speech-tagging, pomegranate
- Homepage:
- Size: 3.24 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Codeowners: CODEOWNERS
Awesome Lists containing this project
README
# Hidden Markov Model Part of Speech tagger - Udacity project
## Introduction
Part of speech tagging is the process of determining the syntactic category of a word from the words in its surrounding context. It is often used to help disambiguate natural language phrases because it can be done quickly with high accuracy. Tagging can be used for many NLP tasks like determining correct pronunciation during speech synthesis (for example, _dis_-count as a noun vs dis-_count_ as a verb), for information retrieval, and for word sense disambiguation.
In this notebook, we'll use the [Pomegranate](http://pomegranate.readthedocs.io/) library to build a hidden Markov model for part of speech tagging using a "universal" tagset. Hidden Markov models have been able to achieve [>96% tag accuracy with larger tagsets on realistic text corpora](http://www.coli.uni-saarland.de/~thorsten/publications/Brants-ANLP00.pdf). Hidden Markov models have also been used for speech recognition and speech generation, machine translation, gene recognition for bioinformatics, and human gesture recognition for computer vision, and more.
![](_post-hmm.png)
## Getting Started
0. (Optional) The provided code includes a function for drawing the network graph that depends on [GraphViz](http://www.graphviz.org/). You must manually install the GraphViz executable for your OS before the steps below or the drawing function will not work.
1. Open a terminal and clone the project repository:
```
$ git clone https://github.com/eljandoubi/Part-of-Speech-Tagging.git
```3. Switch to the project folder and create a conda environment (note: you must already have Anaconda installed):
```
$ cd hmm-tagger
hmm-tagger/ $ conda env create -f hmm-tagger.yaml
```4. Activate the conda environment, then run the jupyter notebook server. (Note: windows users should run `activate hmm-tagger`)
```
hmm-tagger/ $ source activate hmm-tagger
(hmm-tagger) hmm-tagger/ $ jupyter notebook
```Depending on your system settings, Jupyter will either open a browser window, or the terminal will print a URL with a security token. If the terminal prints a URL, simply copy the URL and paste it into a browser window to load the Jupyter browser. Once you load the Jupyter browser, select the project notebook (HMM tagger.ipynb) and follow the instructions inside to complete the project.