https://github.com/pgdr/mood
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/pgdr/mood
- Owner: pgdr
- License: gpl-3.0
- Created: 2016-05-30T07:03:04.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2016-05-30T09:39:19.000Z (about 10 years ago)
- Last Synced: 2025-02-01T23:42:08.227Z (over 1 year ago)
- Language: Python
- Size: 27.3 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# mood, in a sentimental
This program takes two folders "pos" and "neg" containing positive
(resp. negative) text files and learn to predict the mood of a new text file.
The algorithm proceeds as follows:
* Tokenize, lemmatize and remove stop words for every data point
* Pick up 1000 "good" words (how?)
* Create a vector consisting of 1000 words
* For each input data file, construct a 1000 dimensional boolean vector being the characteristic function of the words vector
* Train an SVM (with radial basis function (Gaussian) kernel) on the dataset
* ???
* Predict.
Using 20% cross validation (will update to 10-fold CV later) the predictor today
(with the given words.txt file) achieves ~80% correctness.
There is also a PCA implementation which maps the dataset to a 2 and 3
dimensional hyperplane and visualizes that using matplotlib.
The PCA in 2D (here is a [visualization of the 3D plot](https://gfycat.com/SingleVerifiableDiscus)):
