Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ravi72munde/scala-spark-cab-rides-predictions

A big data project for predicting prices of Uber/Lyft rides depending on the weather
https://github.com/ravi72munde/scala-spark-cab-rides-predictions

predict-prices scala spark spark-streaming streaming uber weather

Last synced: 24 days ago
JSON representation

A big data project for predicting prices of Uber/Lyft rides depending on the weather

Awesome Lists containing this project

README

        

# scala-spark-cab-rides-predictions
A big data project for predicting prices of Uber/Lyft rides depending on the weather.

Dataset was compiled and uploaded to Kaggle. Can be found here https://www.kaggle.com/ravi72munde/uber-lyft-cab-prices

## Contributors:
* Ravi Munde
* Karan Barai

### Project Structure :
* cab-price-connector - Data Collection Scala Project
* Databricks_Prediction_code.html - Anlysis and Spark Model(From Databricks.com)
* Cab_Price_Prediction.ipynb - Random Forrest Model in Python

### Data Model:

#### CabPrice
root
    |- cab_type : String
    |- destination : String
    |- distance: Float
    |- id: String
    |- name: String
    |- price: Float
    |- product_id: String
    |- source: String
    |- surge_multiplier: String
    |- time_stamp:Long

#### Weather
root
    |- clouds : Float
    |- humidity : Float
    |- location : Float
    |- location : String
    |- temp : String
    |- pressure : Float
    |- wind : Float

![Actor System](Actors.png)

Sample log of Actor System Running on EC2

`INFO [CabRideSystem-akka.actor.default-dispatcher-2] a.DynamoActor - received 12 number of weather records`
`INFO [CabRideSystem-akka.actor.default-dispatcher-4] a.DynamoActor - Weather Batch processed on DynamoDB`
`INFO [CabRideSystem-akka.actor.default-dispatcher-9] a.DynamoActor - received 156 number of cab price records`
`INFO [CabRideSystem-akka.actor.default-dispatcher-8] a.DynamoActor - Cab Prices Batch processed on DynamoDB`
`INFO [CabRideSystem-akka.actor.default-dispatcher-7] a.Master - Cab ride data piped to Dynamo Actor`
`INFO [CabRideSystem-akka.actor.default-dispatcher-13] a.DynamoActor - received 156 number of cab price records`
`INFO [CabRideSystem-akka.actor.default-dispatcher-15] a.DynamoActor - Cab Prices Batch processed on DynamoDB`

*NOTE: AWS Creditials need to be put in environment vairables*

### Model Evaluation Matrices
* Regression R_squared = 0.62
* Random Forrest Regression's Price Prediction Accuracy : 92.79 %
* Random Forrest Classification Surge Prediction Accuracy: 77.69 %

Confusion Matrix for the Classifier
drawing