{"id":18885408,"url":"https://github.com/atechguide/nyc-taxi-data-analysis","last_synced_at":"2026-05-14T01:40:15.275Z","repository":{"id":127852023,"uuid":"260873154","full_name":"aTechGuide/nyc-taxi-data-analysis","owner":"aTechGuide","description":"Spark App to Analyse NYC Taxi Data","archived":false,"fork":false,"pushed_at":"2020-05-03T12:00:54.000Z","size":4373,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-31T04:41:57.009Z","etag":null,"topics":["project","sbt","spark"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aTechGuide.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-03T09:50:18.000Z","updated_at":"2021-03-12T15:41:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"0ed4114f-0a5b-41b5-9172-a2923d4319dc","html_url":"https://github.com/aTechGuide/nyc-taxi-data-analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aTechGuide%2Fnyc-taxi-data-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aTechGuide%2Fnyc-taxi-data-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aTechGuide%2Fnyc-taxi-data-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aTechGuide%2Fnyc-taxi-data-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aTechGuide","download_url":"https://codeload.github.com/aTechGuide/nyc-taxi-data-analysis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239858788,"owners_count":19708856,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["project","sbt","spark"],"created_at":"2024-11-08T07:18:27.239Z","updated_at":"2026-02-23T06:30:17.420Z","avatar_url":"https://github.com/aTechGuide.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NYC taxi Data Analysis\n\n# Tech Stack\n- Spark\n- Scala\n- sbt\n\n# Analysis\n- Which zones have the most pickup/drop-offs overall [MostPickupDropoffs.scala]\n- What are the peak hours for taxi [PeakHoursForTaxi.scala]\n- How are the trips distributed by length? Why are people taking the cab? [TripDistribution.scala]\n- What are the peak hours for long/short trips? [PeakHoursForLongShortTrips.scala]\n- What are the top 3 pick up and drop off zones for long/short trips? [TopPickUpAndDropOffForLongShortTrips.scala]\n- How are people paying for the rides, on long / short trips [PeoplePayingForLongShortTrips.scala]\n- How is the payment type evolving with time? [PaymentTypeEvolvingWithTime.scala]\n- Can we explore a ride-sharing opportunity by grouping close short trips? [RideSharingOppertunity.scala]\n\n# Data Sources\n- [www1.nyc.gov](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)\n- [academictorrents.com/](http://academictorrents.com/details/4f465810b86c6b793d1c7556fe3936441081992e)\n\n## Data Size\n\n- ~ 1.4 billion taxi rides between 2009 and 2016\n- ~ 400 GB uncompressed CSV\n- ~ 35 GB snappy parquet\n\n# References\nThis project is build as part of [rockthejvm.com Spark Essentials with Scala](https://rockthejvm.com/p/spark-essentials) course.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatechguide%2Fnyc-taxi-data-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fatechguide%2Fnyc-taxi-data-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatechguide%2Fnyc-taxi-data-analysis/lists"}