https://github.com/victordibia/taxi

Exploring end to end ML pipelines on the Cloud
https://github.com/victordibia/taxi

Last synced: 11 months ago
JSON representation

Exploring end to end ML pipelines on the Cloud

Host: GitHub
URL: https://github.com/victordibia/taxi
Owner: victordibia
License: mit
Created: 2020-12-15T01:34:51.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2021-09-30T23:29:01.000Z (over 4 years ago)
Last Synced: 2024-12-17T18:50:30.637Z (over 1 year ago)
Language: Jupyter Notebook
Homepage: https://taxiadvisor.victordibia.com/
Size: 31.9 MB
Stars: 8
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          ## Taxi Advisor

> Ty a live demo - [https://taxiadvisor.victordibia.com](https://taxiadvisor.victordibia.com/)

This repo provides guidance on how to design and deploy an ML product (Taxi Advisor). It covers the  end-to-end process - data ingest, model training/evaluation, serving + frontend UX. The Taxi Advisor  example uses the [New York Taxi Cab](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) dataset and allows users to specify trip parameters (pickup Zone, drop off Zone and trip date/time) and provides predictions on trip duration and trip fare.

![Front End UI](docs/images/screen.jpg) 

## How It Works

![System Architecture](docs/images/taxipredictions.png)

- Data is ingested from the The New York City Taxi and Limousine Commission (TLC).

- A pair of models (Random Forest, MLP) are trained (multitask mode) to both predict fare and trip time using trip parameters (pickup locationID, drop off locationID and date/time). Model is then exported to Cloud Storage.

- Model is exported imported from Cloud Storage and served (with autoscaling) using Google Cloud AI Platform [prediction API](https://cloud.google.com/ai-platform/prediction/docs/getting-started-scikit-xgboost).

- Front end application collects user trip parameters and queries Cloud AI endpoint.

  

##  Components in this Repo

The links below show how sections of Taxi Advisor are implemented.

- [Data Ingest](notebooks). 

- [Model Training](notebooks): Train a set of models (decision tree, feed forward DNN) to predict fares _and_ trip time given properties of a trip (start and end location id, time of day, etc). Write trained model to a storage bucket.

- Model Serving

  - Cloud AI Platform (Model Serving) -> load trained model from GCS, serve over end point 

- [End User Application](app)

  - App Engine (Front End App) -> serve front end app to consume CloudAI API end point.  

## TODOs

Initial high level list of tasks: 

- [x] Data exploration

  - [x] Explore interesting data insights, data transformation tasks etc 

  - [ ] Automate preprocessing using Spark

- [x] Model Training: 

  - [ ] Explore a initial set of multitask models (Random Forests, MLP), 

  - [ ] Automated hyperparameter search, 

  - [ ] Distributed training and evaluation etc. 

  - [ ] Explore bayesian models that provide principled estimates of uncertainty.

- [ ] Automated pipeline (Composer) to run model training, evaluation, export and serving.

  - [ ] Automatically promote good models to production, 

- [x] Model Serving:  

  - [ ] Serving predictions over an Cloud AI endpoint 

- [x] Front end: User interface for exploring predictions.

  - [x] App engine serving frontend 

## Acknowledgement 

Google has generously supported this work by providing Google Cloud credits as part of the [Google Developer Expert program](https://developers.google.com/community/experts)!.  🙌🙌

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/victordibia/taxi

Awesome Lists containing this project

README