https://github.com/vaasudevans/natural-language-processing-assignments

UNB Fall-2018 NLP Assignments 💬
https://github.com/vaasudevans/natural-language-processing-assignments

baseline bigrams hidden-markov-model information-retrieval-based-chatbot language-models nlp python27 sentiment-analysis unb unigram

Last synced: about 1 year ago
JSON representation

UNB Fall-2018 NLP Assignments 💬

Host: GitHub
URL: https://github.com/vaasudevans/natural-language-processing-assignments
Owner: VaasuDevanS
Created: 2018-12-04T20:23:58.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2019-02-25T14:07:02.000Z (over 7 years ago)
Last Synced: 2025-02-06T18:45:47.595Z (over 1 year ago)
Topics: baseline, bigrams, hidden-markov-model, information-retrieval-based-chatbot, language-models, nlp, python27, sentiment-analysis, unb, unigram
Language: Python
Homepage:
Size: 23.5 MB
Stars: 3
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Natural-Language-Processing-Assignments

University of New Brunswick Fall-2018 CS6765: Natural Language Processing

This Repository contains the python code for the Fall Term Assignments.  

No usage of numpy/nltk in any of the code and developed using Python2.7 (built-in modules)  

sklearn is used only in Assignment3 for Logistic Regression

## Getting started

| No  | Python-file  | Usage

|:-:|:-:|:-:|

| 1  | tokenize.py
 count.py  | python tokenize.py FILE > FILE.tokens
 python count.py FILE.tokens > FILE.freqs      

| 2 |  lm.py
perplexity.py |  python lm.py MODEL TRAIN_FILE TEST_FILE > OUTPUT
python perplexity.py OUTPUT

| 3 | classify.py
score.py  | python classify.py METHOD TRAIN_DOCS TRAIN_CLASSES TEST_DOCS > PREDICTED_CLASSES
 python score.py PREDICTED_CLASSES TRUE_CLASSES

| 4 | tag.py
accuracy.py  | python tag.py TRAIN_FILE TEST_FILE METHOD > SYSTEM_OUTPUT
python accuracy.py TRUE_TAGS SYSTEM_OUTPUT

| 5 | chatbot.py |  python chatbot.py METHOD  

## Arguments

| No  | Arguments  | File-Location (in Individual Assignment folder)

|:-:|:-:|:-:|

| 1  | FILE | Data/tweets-en.txt.gz      

| 2 |  MODEL
TRAIN_FILE
TEST_FILE |  1 or 2 or interp
Data/reuters-train.txt
Data/reuters-dev.txt

| 3 | METHOD
TRAIN_DOCS
TRAIN_CLASSES
TEST_FILE
TRUE_CLASSES  | baseline or lr or lexicon or nb or nbbin
Data/train.docs.txt
Data/train.classes.txt
Data/dev.docs.txt
Data/dev.classes.txt

| 4 | TRAIN_FILE
TEST_FILE
METHOD
TRUE_TAGS  |Data/train.en.txt
Data/dev.en.words.txt
baseline or hmm
Data/dev.en.tags.txt

| 5 | METHOD |  overlap
w2v
both

Assignment 2: - 

MODEL  

* 1 represents Unigram (with Add-1 smoothing)

* 2 represents Bigram (with Add-k smoothing)

* 3 represents Interpolated (both Unigram and Bigram)

Assignment 3: - 

METHOD  

* baseline represents Most-Frequent-Class-Baseline

* lr represents Logistic Regression (used from skimage)

* lexicon represents Sentiment Lexicon containing + and - words

* nb represents Naive Bayes Model (with add-k smoothing)

* nbbin represents Binarized Naive Bayes

Assignment 4: - 

METHOD 

* baseline represents Most-Frequent-Tag-Baseline

* 2 represents Hidden Markov Model (Bigram with add-k smoothing) and Viterbi Algorithm

Assignment 5: - 

METHOD  

* overlap represents Chatbot responses based on the word overlap

* w2v represents Response with highest Cosine value (from pre-trained vectors from fastText)

* both represents both responses from overlap and w2v with their Cosine values

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vaasudevans/natural-language-processing-assignments

Awesome Lists containing this project

README