https://github.com/randikabanura/spark_ml_music_genre_prediction
BDAT Assignment
https://github.com/randikabanura/spark_ml_music_genre_prediction
Last synced: 2 months ago
JSON representation
BDAT Assignment
- Host: GitHub
- URL: https://github.com/randikabanura/spark_ml_music_genre_prediction
- Owner: randikabanura
- License: mit
- Created: 2023-03-25T15:12:56.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-04-17T04:36:01.000Z (about 2 years ago)
- Last Synced: 2025-02-05T02:39:07.713Z (4 months ago)
- Language: Java
- Size: 425 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Music Genre Classification (Spark ML)
## Idea
Create few examples to demonstrate regression, classification and clustering to Java developers.
Main focus is on feature extraction and creation of interesting ML pipelines.### Genre Classification
Given part of lyric from a lyric recognize genre.Strategy:
* Collect raw data set of lyrics (~30k sentences in total):
* Pop
* Country
* Blues
* Jazz
* Rock
* Reggae
* Hiphop
* Metal
* Create training set, i.e. label (0|1|2|3|4|5|6|7) + features
* Train logistic regression## Build, Configure and Run
### Build
Standard build:
```
./gradlew clean build shadowJar
```
Quick build without tests:
```
./gradlew clean build shadowJar -x test
```
### Configuration
All available configuration properties are spread out via 3 files:
* application.properties - contains business logic specific stuff
* spark.properties - contains Spark specific stuffAll properties are self explanatory, but few the most important ones are listed explicitly below.
#### Application Properties
| Name | Type | Default value | Description |
| ---- | ---- | ------------- | ----------- |
| server.port | Integer | 9090 | The port to listen for incoming HTTP requests |#### Spark Properties
| Name | Type | Default value | Description |
| ---- | ---- | ------------- | ----------- |
| spark.master | String | spark://127.0.0.1:7077 | The URL of the Spark master. For development purposes, you can use `local[n]` that will run Spark on n threads on the local machine without connecting to a cluster. For example, `local[2]`. |
|spark.distributed-libraries | String | | Path to distributed library that should be loaded into each worker of a Spark cluster. |#### Sample configuration for a local development environment
Create *application.properties* (for instance, in your user home directory) and override any of the described properties.
For instance, minimum set of values that should be specified for your local environment is listed below:
```
spark.distributed-libraries=/spark-distributed-library/build/libs/spark-distributed-library-1.0-SNAPSHOT-all.jarlyrics.training.set.directory.path=data/lyrics/
lyrics.model.directory.path=data/lyrics/model
```
### RunFrom your favourite IDE plese run `ApplicationConfiguration` main method.
This will use default configuration bundled in the source code.In order to run the application with custom configuration please add spring.config.location parameter that corresponds to directory that contains your custom *application.properties* (in our example your user home directory). Or just enumerate them explicitly, for instance:
```
spring.config.location=/Users//application.properties
```## Presentation and Demo
Can check out the Spark ML model prediction in the following video.
[Watch the video](https://drive.google.com/file/d/1Ou_vmNAkbeLf8k_XJLyk0Em0rUg7OWds/view?usp=share_link)
## Author
Name: [Banura Randika Perera](https://github.com/randikabanura)
Linkedin: [randika-banura](https://www.linkedin.com/in/randika-banura/)
Email: [[email protected]](mailto:[email protected])## Show your support
Please ⭐️ this repository if this project helped you!
## License
See [LICENSE](LICENSE) © [randikabanura](https://github.com/randikabanura/)