{"id":19676787,"url":"https://github.com/victordibia/taxi","last_synced_at":"2025-07-22T14:37:24.074Z","repository":{"id":55654434,"uuid":"321520078","full_name":"victordibia/taxi","owner":"victordibia","description":"Exploring end to end ML pipelines on the Cloud","archived":false,"fork":false,"pushed_at":"2021-09-30T23:29:01.000Z","size":33476,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-17T18:50:30.637Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://taxiadvisor.victordibia.com/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/victordibia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-15T01:34:51.000Z","updated_at":"2023-08-17T02:05:00.000Z","dependencies_parsed_at":"2022-08-15T05:40:32.465Z","dependency_job_id":null,"html_url":"https://github.com/victordibia/taxi","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Ftaxi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Ftaxi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Ftaxi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Ftaxi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/victordibia","download_url":"https://codeload.github.com/victordibia/taxi/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231976868,"owners_count":18454861,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T17:30:14.671Z","updated_at":"2025-01-01T02:06:49.989Z","avatar_url":"https://github.com/victordibia.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Taxi Advisor\n\n\u003e Ty a live demo - [https://taxiadvisor.victordibia.com](https://taxiadvisor.victordibia.com/)\n\n\nThis repo provides guidance on how to design and deploy an ML product (Taxi Advisor). It covers the  end-to-end process - data ingest, model training/evaluation, serving + frontend UX. The Taxi Advisor  example uses the [New York Taxi Cab](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) dataset and allows users to specify trip parameters (pickup Zone, drop off Zone and trip date/time) and provides predictions on trip duration and trip fare.\n\n![Front End UI](docs/images/screen.jpg) \n\n\n## How It Works\n![System Architecture](docs/images/taxipredictions.png)\n\n- Data is ingested from the The New York City Taxi and Limousine Commission (TLC).\n- A pair of models (Random Forest, MLP) are trained (multitask mode) to both predict fare and trip time using trip parameters (pickup locationID, drop off locationID and date/time). Model is then exported to Cloud Storage.\n- Model is exported imported from Cloud Storage and served (with autoscaling) using Google Cloud AI Platform [prediction API](https://cloud.google.com/ai-platform/prediction/docs/getting-started-scikit-xgboost).\n- Front end application collects user trip parameters and queries Cloud AI endpoint.\n  \n\n\n##  Components in this Repo\n\nThe links below show how sections of Taxi Advisor are implemented.\n\n- [Data Ingest](notebooks). \n- [Model Training](notebooks): Train a set of models (decision tree, feed forward DNN) to predict fares _and_ trip time given properties of a trip (start and end location id, time of day, etc). Write trained model to a storage bucket.\n- Model Serving\n  - Cloud AI Platform (Model Serving) -\u003e load trained model from GCS, serve over end point \n- [End User Application](app)\n  - App Engine (Front End App) -\u003e serve front end app to consume CloudAI API end point.  \n\n## TODOs\n\nInitial high level list of tasks: \n\n- [x] Data exploration\n  - [x] Explore interesting data insights, data transformation tasks etc \n  - [ ] Automate preprocessing using Spark\n- [x] Model Training: \n  - [ ] Explore a initial set of multitask models (Random Forests, MLP), \n  - [ ] Automated hyperparameter search, \n  - [ ] Distributed training and evaluation etc. \n  - [ ] Explore bayesian models that provide principled estimates of uncertainty.\n- [ ] Automated pipeline (Composer) to run model training, evaluation, export and serving.\n  - [ ] Automatically promote good models to production, \n- [x] Model Serving:  \n  - [ ] Serving predictions over an Cloud AI endpoint \n- [x] Front end: User interface for exploring predictions.\n  - [x] App engine serving frontend \n\n## Acknowledgement \n\nGoogle has generously supported this work by providing Google Cloud credits as part of the [Google Developer Expert program](https://developers.google.com/community/experts)!.  🙌🙌","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictordibia%2Ftaxi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvictordibia%2Ftaxi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictordibia%2Ftaxi/lists"}