https://github.com/pgdr/mood

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/pgdr/mood
Owner: pgdr
License: gpl-3.0
Created: 2016-05-30T07:03:04.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2016-05-30T09:39:19.000Z (about 10 years ago)
Last Synced: 2025-02-01T23:42:08.227Z (over 1 year ago)
Language: Python
Size: 27.3 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # mood, in a sentimental

This program takes two folders "pos" and "neg" containing positive

(resp. negative) text files and learn to predict the mood of a new text file.

The algorithm proceeds as follows:

* Tokenize, lemmatize and remove stop words for every data point

* Pick up 1000 "good" words (how?)

* Create a vector consisting of 1000 words

* For each input data file, construct a 1000 dimensional boolean vector being the characteristic function of the words vector

* Train an SVM (with radial basis function (Gaussian) kernel) on the dataset

* ???

* Predict.

Using 20% cross validation (will update to 10-fold CV later) the predictor today

(with the given words.txt file) achieves ~80% correctness.

There is also a PCA implementation which maps the dataset to a 2 and 3

dimensional hyperplane and visualizes that using matplotlib.

The PCA in 2D (here is a [visualization of the 3D plot](https://gfycat.com/SingleVerifiableDiscus)):

![2D PCA](http://i.stack.imgur.com/4Uxya.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pgdr/mood

Awesome Lists containing this project

README