https://github.com/rooth13/problemset1
Data 340: Fall 2023
https://github.com/rooth13/problemset1
Last synced: 3 days ago
JSON representation
Data 340: Fall 2023
- Host: GitHub
- URL: https://github.com/rooth13/problemset1
- Owner: rootH13
- Created: 2023-11-19T01:52:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-24T03:55:42.000Z (over 2 years ago)
- Last Synced: 2025-03-08T07:41:55.581Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 2.31 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ProblemSet1
Data 340: Fall 2023
For Problem Set 1:
Naive Bayes Model
-After extracting all of the files from Kaggle, I was able to take some excerpts
from class notebooks to tokenize words or phrases using different tokenizers
(spacy, Genshim, and NLTK).
- I was then able to move onto the Naive Bayes Model. From looking at the files
first few columns, there seems to be a disproportionate amount of reviews
with two's. As expected, my confusion matrix showed the most density
in the two's for sentiment. My model performance, however, was around
59% performance (approximately). I will have to keep training it to get
better accuracy and performance.
Logistic Regression Model
-I tried to follow similar logic used in class by using sklearn to output
performance. My logistic regression model performed so much worse than my
Naive Bayes model, but still showed the most clustering/density in the two's.