An open API service indexing awesome lists of open source software.

https://github.com/vaasudevans/natural-language-processing-assignments

UNB Fall-2018 NLP Assignments 💬
https://github.com/vaasudevans/natural-language-processing-assignments

baseline bigrams hidden-markov-model information-retrieval-based-chatbot language-models nlp python27 sentiment-analysis unb unigram

Last synced: about 1 year ago
JSON representation

UNB Fall-2018 NLP Assignments 💬

Awesome Lists containing this project

README

          

# Natural-Language-Processing-Assignments
University of New Brunswick Fall-2018 CS6765: Natural Language Processing

This Repository contains the python code for the Fall Term Assignments.
No usage of numpy/nltk in any of the code and developed using Python2.7 (built-in modules)
sklearn is used only in Assignment3 for Logistic Regression

## Getting started

| No | Python-file | Usage
|:-:|:-:|:-:|
| 1 | tokenize.py
count.py | python tokenize.py FILE > FILE.tokens
python count.py FILE.tokens > FILE.freqs
| 2 | lm.py
perplexity.py | python lm.py MODEL TRAIN_FILE TEST_FILE > OUTPUT
python perplexity.py OUTPUT
| 3 | classify.py
score.py | python classify.py METHOD TRAIN_DOCS TRAIN_CLASSES TEST_DOCS > PREDICTED_CLASSES
python score.py PREDICTED_CLASSES TRUE_CLASSES
| 4 | tag.py
accuracy.py | python tag.py TRAIN_FILE TEST_FILE METHOD > SYSTEM_OUTPUT
python accuracy.py TRUE_TAGS SYSTEM_OUTPUT
| 5 | chatbot.py | python chatbot.py METHOD

## Arguments

| No | Arguments | File-Location (in Individual Assignment folder)
|:-:|:-:|:-:|
| 1 | FILE | Data/tweets-en.txt.gz
| 2 | MODEL
TRAIN_FILE
TEST_FILE | 1 or 2 or interp
Data/reuters-train.txt
Data/reuters-dev.txt
| 3 | METHOD
TRAIN_DOCS
TRAIN_CLASSES
TEST_FILE
TRUE_CLASSES | baseline or lr or lexicon or nb or nbbin
Data/train.docs.txt
Data/train.classes.txt
Data/dev.docs.txt
Data/dev.classes.txt
| 4 | TRAIN_FILE
TEST_FILE
METHOD
TRUE_TAGS |Data/train.en.txt
Data/dev.en.words.txt
baseline or hmm
Data/dev.en.tags.txt
| 5 | METHOD | overlap
w2v
both

Assignment 2: -
MODEL
* 1 represents Unigram (with Add-1 smoothing)
* 2 represents Bigram (with Add-k smoothing)
* 3 represents Interpolated (both Unigram and Bigram)

Assignment 3: -
METHOD
* baseline represents Most-Frequent-Class-Baseline
* lr represents Logistic Regression (used from skimage)
* lexicon represents Sentiment Lexicon containing + and - words
* nb represents Naive Bayes Model (with add-k smoothing)
* nbbin represents Binarized Naive Bayes

Assignment 4: -
METHOD
* baseline represents Most-Frequent-Tag-Baseline
* 2 represents Hidden Markov Model (Bigram with add-k smoothing) and Viterbi Algorithm

Assignment 5: -
METHOD
* overlap represents Chatbot responses based on the word overlap
* w2v represents Response with highest Cosine value (from pre-trained vectors from fastText)
* both represents both responses from overlap and w2v with their Cosine values