https://github.com/vaasudevans/natural-language-processing-assignments
UNB Fall-2018 NLP Assignments 💬
https://github.com/vaasudevans/natural-language-processing-assignments
baseline bigrams hidden-markov-model information-retrieval-based-chatbot language-models nlp python27 sentiment-analysis unb unigram
Last synced: about 1 year ago
JSON representation
UNB Fall-2018 NLP Assignments 💬
- Host: GitHub
- URL: https://github.com/vaasudevans/natural-language-processing-assignments
- Owner: VaasuDevanS
- Created: 2018-12-04T20:23:58.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-25T14:07:02.000Z (over 7 years ago)
- Last Synced: 2025-02-06T18:45:47.595Z (over 1 year ago)
- Topics: baseline, bigrams, hidden-markov-model, information-retrieval-based-chatbot, language-models, nlp, python27, sentiment-analysis, unb, unigram
- Language: Python
- Homepage:
- Size: 23.5 MB
- Stars: 3
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Natural-Language-Processing-Assignments
University of New Brunswick Fall-2018 CS6765: Natural Language Processing
This Repository contains the python code for the Fall Term Assignments.
No usage of numpy/nltk in any of the code and developed using Python2.7 (built-in modules)
sklearn is used only in Assignment3 for Logistic Regression
## Getting started
| No | Python-file | Usage
|:-:|:-:|:-:|
| 1 | tokenize.py
count.py | python tokenize.py FILE > FILE.tokens
python count.py FILE.tokens > FILE.freqs
| 2 | lm.py
perplexity.py | python lm.py MODEL TRAIN_FILE TEST_FILE > OUTPUT
python perplexity.py OUTPUT
| 3 | classify.py
score.py | python classify.py METHOD TRAIN_DOCS TRAIN_CLASSES TEST_DOCS > PREDICTED_CLASSES
python score.py PREDICTED_CLASSES TRUE_CLASSES
| 4 | tag.py
accuracy.py | python tag.py TRAIN_FILE TEST_FILE METHOD > SYSTEM_OUTPUT
python accuracy.py TRUE_TAGS SYSTEM_OUTPUT
| 5 | chatbot.py | python chatbot.py METHOD
## Arguments
| No | Arguments | File-Location (in Individual Assignment folder)
|:-:|:-:|:-:|
| 1 | FILE | Data/tweets-en.txt.gz
| 2 | MODEL
TRAIN_FILE
TEST_FILE | 1 or 2 or interp
Data/reuters-train.txt
Data/reuters-dev.txt
| 3 | METHOD
TRAIN_DOCS
TRAIN_CLASSES
TEST_FILE
TRUE_CLASSES | baseline or lr or lexicon or nb or nbbin
Data/train.docs.txt
Data/train.classes.txt
Data/dev.docs.txt
Data/dev.classes.txt
| 4 | TRAIN_FILE
TEST_FILE
METHOD
TRUE_TAGS |Data/train.en.txt
Data/dev.en.words.txt
baseline or hmm
Data/dev.en.tags.txt
| 5 | METHOD | overlap
w2v
both
Assignment 2: -
MODEL
* 1 represents Unigram (with Add-1 smoothing)
* 2 represents Bigram (with Add-k smoothing)
* 3 represents Interpolated (both Unigram and Bigram)
Assignment 3: -
METHOD
* baseline represents Most-Frequent-Class-Baseline
* lr represents Logistic Regression (used from skimage)
* lexicon represents Sentiment Lexicon containing + and - words
* nb represents Naive Bayes Model (with add-k smoothing)
* nbbin represents Binarized Naive Bayes
Assignment 4: -
METHOD
* baseline represents Most-Frequent-Tag-Baseline
* 2 represents Hidden Markov Model (Bigram with add-k smoothing) and Viterbi Algorithm
Assignment 5: -
METHOD
* overlap represents Chatbot responses based on the word overlap
* w2v represents Response with highest Cosine value (from pre-trained vectors from fastText)
* both represents both responses from overlap and w2v with their Cosine values