https://github.com/rooth13/problemset1

Data 340: Fall 2023
https://github.com/rooth13/problemset1

Last synced: 3 days ago
JSON representation

Data 340: Fall 2023

Host: GitHub
URL: https://github.com/rooth13/problemset1
Owner: rootH13
Created: 2023-11-19T01:52:44.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-11-24T03:55:42.000Z (over 2 years ago)
Last Synced: 2025-03-08T07:41:55.581Z (over 1 year ago)
Language: Jupyter Notebook
Size: 2.31 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # ProblemSet1

Data 340: Fall 2023

For Problem Set 1:

Naive Bayes Model

-After extracting all of the files from Kaggle, I was able to take some excerpts 

 from class notebooks to tokenize words or phrases using different tokenizers

 (spacy, Genshim, and NLTK).

- I was then able to move onto the Naive Bayes Model. From looking at the files

  first few columns, there seems to be a disproportionate amount of reviews

  with two's. As expected, my confusion matrix showed the most density

  in the two's for sentiment. My model performance, however, was around

  59% performance (approximately). I will have to keep training it to get

  better accuracy and performance.

Logistic Regression Model

-I tried to follow similar logic used in class by using sklearn to output 

 performance. My logistic regression model performed so much worse than my

 Naive Bayes model, but still showed the most clustering/density in the two's.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rooth13/problemset1

Awesome Lists containing this project

README