Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jcardonamde/datasets_ml

This project analyzes cab and limousine travel data in New York City. This with the goal of predicting the total duration of trips within the city. Machine learning models were used.
https://github.com/jcardonamde/datasets_ml

data-science machine-learning machine-learning-algorithms matplotlib numpy pandas pipelines python seaborn sklearn

Last synced: 11 days ago
JSON representation

This project analyzes cab and limousine travel data in New York City. This with the goal of predicting the total duration of trips within the city. Machine learning models were used.

Awesome Lists containing this project

README

        

# New York City Taxi Trip Duration

![](https://docs.google.com/drawings/d/e/2PACX-1vRLrhh818nyaxd16zQGBnHCV325Gl2JGgCJFUQqJ9GIi-EQ3BtpeE0qz-4DaasifP3tAgW4Kztxt2tQ/pub?w=687&h=386)

At one time or another, almost all of us have used an Uber or other transportation service in this digital age to take a ride. Ridesharing services are services that use online-enabled platforms to connect between passengers and local drivers using their personal vehicles.

In most cases they are a convenient method for door-to-door transportation. They are generally cheaper than using licensed cabs. Examples of ridesharing services include Uber, Cabify, Beat, Didi, etc.

To improve the efficiency of cab dispatch systems for such services, it is important to be able to predict how long a driver will have their cab occupied. If a dispatcher knew approximately when a cab driver would finish their current trip, they could better identify which driver to assign to each pickup request.

This project worked with a dataset published by the New York City Taxi and Limousine Commission, which includes pickup time, geographic coordinates, number of passengers among other variables. The goal of this project is to predict the total duration of cab trips in New York City.

👉 The dataset used for this analysis was downloaded [here](https://www.kaggle.com/c/nyc-taxi-trip-duration)

💻📚 Libraries used: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn.

:microscope::dart: Applied models: Linear Regression, Regression Tree, Regression XGBoost and Regression KNN.

👀:bar_chart: Previews:

![](https://docs.google.com/drawings/d/e/2PACX-1vT71-ztcKxRuR5k8vL7Xwj_4Rwyech9vlwYkH5cG8h9Ihf6RhPj1fCw1-uIE_O4O-OtNfX8AQ3s-47l/pub?w=745&h=562)

![](https://docs.google.com/drawings/d/e/2PACX-1vRDyW_PQpwmmpEDO0putBjbiIP3QepLFXcazg6Z4lrgDOZrcka6oc77IMY2jvYdFotfQORX8ZJ3eUxW/pub?w=959&h=537)

![](https://docs.google.com/drawings/d/e/2PACX-1vRMJxGVooqZOS-61DMQ1thq8Nhxb62SArATlxy23qcx6G-tOwmvN5WGvEqtdX_RZTzBVIZH2689dmgJ/pub?w=914&h=518)

![](https://docs.google.com/drawings/d/e/2PACX-1vQynD4knXrhNVvKRB8tc-3GuFSEkF-S8ajHCNzdJe6385Z8brsgTS0cXOYRPmsM9G6pWB73r1ic_Z-W/pub?w=915&h=354)

![](https://docs.google.com/drawings/d/e/2PACX-1vRmvGaZqj53ac1losjZ4f0PJvh2-TsLBG2FDaYog5gRRYywZAHdz0Qn1iZxwm7EsYTWDWCQg6z5QLUz/pub?w=925&h=348)

![](https://docs.google.com/drawings/d/e/2PACX-1vTGVwU_nrYQVfe1qTKFRBB87PQwWBCBV0F70veX4N41YmesYy4a5QDqxESX9M5zydxWMzfMXwNmJFXN/pub?w=922&h=347)

![](https://docs.google.com/drawings/d/e/2PACX-1vR6G_M6QKq7bezu7bgCjA69reLA2C5irNGUFYWhKz6UI5bLfKAKp59ZbJWA87ockeVxNKsHjPI8B9DZ/pub?w=916&h=342)