Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/somjit101/nyc-taxi-demand-prediction

This is a Time Series Forecasting and Regression solution to project the no. of pick-ups at and around a given region at a given time in the city of New York, USA.
https://github.com/somjit101/nyc-taxi-demand-prediction

baseline-model clustering dask dask-dataframes data-cleaning data-visualization exponential-moving-average fft-analysis fourier-transform k-means-clustering new-york-taxi-demand-prediction performance-analysis random-forest-regression scikit-learn time-series-analysis time-series-forecasting xgboost-regression

Last synced: 25 days ago
JSON representation

This is a Time Series Forecasting and Regression solution to project the no. of pick-ups at and around a given region at a given time in the city of New York, USA.

Awesome Lists containing this project

README

        

# NYC-Taxi-Demand-Prediction
This is a Time Series Forecasting and Regression solution to project the no. of pick-ups at and around a given region at a given time in the city of New York, USA.

## Dataset Information

We can get the data from [here](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml) (2016 data).

The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC)

### Information on taxis:

**

Yellow Taxi: Yellow Medallion Taxicabs
**

These are the famous NYC yellow taxis that provide transportation exclusively through street-hails. The number of taxicabs is limited by a finite number of medallions issued by the TLC. You access this mode of transportation by standing in the street and hailing an available taxi with your hand. The pickups are not pre-arranged.

**

For Hire Vehicles (FHVs)
**

FHV transportation is accessed by a pre-arrangement with a dispatcher or limo company. These FHVs are not permitted to pick up passengers via street hails, as those rides are not considered pre-arranged.

**

Green Taxi: Street Hail Livery (SHL)
**

The SHL program will allow livery vehicle owners to license and outfit their vehicles with green borough taxi branding, meters, credit card machines, and ultimately the right to accept street hails in addition to pre-arranged rides.


Credits: Quora

**

Note:
**
In the given notebook we are considering only the yellow taxis for the time period between Jan - Mar 2015 & Jan - Mar 2016

## Data Collection
We Have collected all yellow taxi trips data from jan-2015 to dec-2016(Will be using only 2015 data)

file name
file name size
number of records
number of features

yellow_tripdata_2016-01
1. 59G
10906858
19

yellow_tripdata_2016-02
1. 66G
11382049
19

yellow_tripdata_2016-03
1. 78G
12210952
19

yellow_tripdata_2016-04
1. 74G
11934338
19

yellow_tripdata_2016-05
1. 73G
11836853
19

yellow_tripdata_2016-06
1. 62G
11135470
19

yellow_tripdata_2016-07
884Mb
10294080
17

yellow_tripdata_2016-08
854Mb
9942263
17

yellow_tripdata_2016-09
870Mb
10116018
17

yellow_tripdata_2016-10
933Mb
10854626
17

yellow_tripdata_2016-11
868Mb
10102128
17

yellow_tripdata_2016-12
897Mb
10449408
17

yellow_tripdata_2015-01
1.84Gb
12748986
19

yellow_tripdata_2015-02
1.81Gb
12450521
19

yellow_tripdata_2015-03
1.94Gb
13351609
19

yellow_tripdata_2015-04
1.90Gb
13071789
19

yellow_tripdata_2015-05
1.91Gb
13158262
19

yellow_tripdata_2015-06
1.79Gb
12324935
19

yellow_tripdata_2015-07
1.68Gb
11562783
19

yellow_tripdata_2015-08
1.62Gb
11130304
19

yellow_tripdata_2015-09
1.63Gb
11225063
19

yellow_tripdata_2015-10
1.79Gb
12315488
19

yellow_tripdata_2015-11
1.65Gb
11312676
19

yellow_tripdata_2015-12
1.67Gb
11460573
19

## Features in the dataset:

Field Name
Description

VendorID

A code indicating the TPEP provider that provided the record.


  1. Creative Mobile Technologies

  2. VeriFone Inc.

tpep_pickup_datetime
The date and time when the meter was engaged.

tpep_dropoff_datetime
The date and time when the meter was disengaged.

Passenger_count
The number of passengers in the vehicle. This is a driver-entered value.

Trip_distance
The elapsed trip distance in miles reported by the taximeter.

Pickup_longitude
Longitude where the meter was engaged.

Pickup_latitude
Latitude where the meter was engaged.

RateCodeID
The final rate code in effect at the end of the trip.


  1. Standard rate

  2. JFK

  3. Newark

  4. Nassau or Westchester

  5. Negotiated fare

  6. Group ride

Store_and_fwd_flag
This flag indicates whether the trip record was held in vehicle memory before sending to the vendor,
aka “store and forward,” because the vehicle did not have a connection to the server.

Y= store and forward trip

N= not a store and forward trip

Dropoff_longitude
Longitude where the meter was disengaged.

Dropoff_ latitude
Latitude where the meter was disengaged.

Payment_type
A numeric code signifying how the passenger paid for the trip.


  1. Credit card

  2. Cash

  3. No charge

  4. Dispute

  5. Unknown

  6. Voided trip

Fare_amount
The time-and-distance fare calculated by the meter.

Extra
Miscellaneous extras and surcharges. Currently, this only includes. the $0.50 and $1 rush hour and overnight charges.

MTA_tax
0.50 MTA tax that is automatically triggered based on the metered rate in use.

Improvement_surcharge
0.30 improvement surcharge assessed trips at the flag drop. the improvement surcharge began being levied in 2015.

Tip_amount
Tip amount – This field is automatically populated for credit card tips.Cash tips are not included.

Tolls_amount
Total amount of all tolls paid in trip.

Total_amount
The total amount charged to passengers. Does not include cash tips.


## ML Problem Formulation

Time-series forecasting and Regression




- To find number of pickups, given location cordinates(latitude and longitude) and time, in the query reigion and surrounding regions.


To solve the above we would be using data collected in Jan - Mar 2015 to predict the pickups in Jan - Mar 2016.



## Performance metrics
1. Mean Absolute percentage error.
2. Mean Squared error.