https://github.com/zivl/nlp_final
final nlp project at IDC 2013 - by Ziv and Shachar
https://github.com/zivl/nlp_final
Last synced: about 1 year ago
JSON representation
final nlp project at IDC 2013 - by Ziv and Shachar
- Host: GitHub
- URL: https://github.com/zivl/nlp_final
- Owner: zivl
- Created: 2013-05-30T14:56:02.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2013-07-07T07:23:57.000Z (almost 13 years ago)
- Last Synced: 2025-03-14T21:04:51.189Z (about 1 year ago)
- Language: Java
- Size: 1.92 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
NLP@IDC – Final Course Project, Spring 2013
Songs Genres Classification
Today, we usually define a song’s genre by its vocal elemnts – melody, instruments, etc.
We intend to find if we can define a genre simply through the lyrics – whether the written elements of songs contain significant genre information.
Why ?
Because its AWESOME!
Also, this is important to the study of literature and poetry – it could hint at a cognitive connection between melodies and ideas, and help us define the term “genre” and its meanings, thus grounding the study of literature and poetry in something a bit more robust than just personal taste and opinion.
How? Logistic Regression
Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable based on one or more predictor variables.
Variables
- Words – we’ll examine how distinct is the vocabulary of each genre.
- Structure – if time constraints will allow, we’ll examine whether genres also imply a certain structure on the lyrics.
- Diversity – even if genres use the same words, do they repeat themselves (“Baby, baby, baby oooh”), or does each song contain a wider selection of words?
Limitations and Future Enhancements
Lyrics databases aren’t quality-controlled, don’t carry the same formats between songs (especially obvious with “censored” words, which can be written in different ways – S-it, Sh*t, *%#$, etc.)
No existing tree-banks on song lyrics, which limits us to low-level analysis.
No efficient data-mining tools – either we create them, or we hand-create all of our data-sets.