An open API service indexing awesome lists of open source software.

https://github.com/zivl/nlp_final

final nlp project at IDC 2013 - by Ziv and Shachar
https://github.com/zivl/nlp_final

Last synced: about 1 year ago
JSON representation

final nlp project at IDC 2013 - by Ziv and Shachar

Awesome Lists containing this project

README

          

NLP@IDC – Final Course Project, Spring 2013

Songs Genres Classification




Today, we usually define a song’s genre by its vocal elemnts – melody, instruments, etc.


We intend to find if we can define a genre simply through the lyrics – whether the written elements of songs contain significant genre information.

Why ?


Because its AWESOME!



Also, this is important to the study of literature and poetry – it could hint at a cognitive connection between melodies and ideas, and help us define the term “genre” and its meanings, thus grounding the study of literature and poetry in something a bit more robust than just personal taste and opinion.

How? Logistic Regression


Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable based on one or more predictor variables.


Variables



  • Words – we’ll examine how distinct is the vocabulary of each genre.

  • Structure – if time constraints will allow, we’ll examine whether genres also imply a certain structure on the lyrics.

  • Diversity – even if genres use the same words, do they repeat themselves (“Baby, baby, baby oooh”), or does each song contain a wider selection of words?

Limitations and Future Enhancements


Lyrics databases aren’t quality-controlled, don’t carry the same formats between songs (especially obvious with “censored” words, which can be written in different ways – S-it, Sh*t, *%#$, etc.)
No existing tree-banks on song lyrics, which limits us to low-level analysis.
No efficient data-mining tools – either we create them, or we hand-create all of our data-sets.