{"id":21649506,"url":"https://github.com/pyrypp/taxi_demand_prediction","last_synced_at":"2026-04-18T13:36:59.203Z","repository":{"id":264458101,"uuid":"893433420","full_name":"pyrypp/taxi_demand_prediction","owner":"pyrypp","description":"A website providing a real time forecast on taxi demand","archived":false,"fork":false,"pushed_at":"2024-12-15T16:56:50.000Z","size":35327,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-20T02:39:37.806Z","etag":null,"topics":["aws","full-stack","neural-network","postgresql"],"latest_commit_sha":null,"homepage":"https://taxipoint.streamlit.app/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pyrypp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-24T12:46:26.000Z","updated_at":"2024-12-15T16:56:54.000Z","dependencies_parsed_at":"2025-03-20T02:50:15.588Z","dependency_job_id":null,"html_url":"https://github.com/pyrypp/taxi_demand_prediction","commit_stats":null,"previous_names":["pyrypp/taxi_demand_prediction"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pyrypp/taxi_demand_prediction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyrypp%2Ftaxi_demand_prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyrypp%2Ftaxi_demand_prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyrypp%2Ftaxi_demand_prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyrypp%2Ftaxi_demand_prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pyrypp","download_url":"https://codeload.github.com/pyrypp/taxi_demand_prediction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyrypp%2Ftaxi_demand_prediction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31971488,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"online","status_checked_at":"2026-04-18T02:00:07.018Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","full-stack","neural-network","postgresql"],"created_at":"2024-11-25T07:31:43.833Z","updated_at":"2026-04-18T13:36:59.183Z","avatar_url":"https://github.com/pyrypp.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Taxi demand prediction app\n![banner_image](banner.jpg)\n\nAt Helsinki Airport, a taxi driver is almost guaranteed to find a customer. However, the number of arriving air passengers varies greatly depending on the day and time. The goal of this project was to determine the most optimal times for a taxi driver to queue at the airport taxi station.\n\nUsing data from Finavia (airport operator) and Taxipoint (airport taxi traffic coordinator), it is possible to predict the demand for the next 24 hours.\n\nReal time data collection from APIs and a simple neural network prediction script were ran on AWS cloud. A new prediction was provided every 15 minutes. An app visualizing the predictions was hosted on Streamlit. \n\n- Website: https://taxipoint.streamlit.app/\n- Github: https://github.com/pyrypp/taxipoint_streamlit\n\nData was collected for 6 months, and the prediction service was running for approximately one month. It was used by over 40 taxi drivers but was shut down due to high costs.\n\n## Demo\nThis is a real timelapse of the service in operation. On the left side there are past observations. On the right side is the prediction. The unit of the y-axis is the number of customers per 15 minues.\n\nTraffic peaks are highlighted and the number of passengers in that peak time is indicated on top of the peak.\n[![demo_gif](/images/demo.gif)](https://taxipoint.streamlit.app/)\n\n## How it works\nAt Helsinki Airport the main taxi station fits approximately 30 cars. To avoid queues on the public road leading to the airport, there is a designated queing area for taxis slightly further away. \n\nThere are automatic boom barriers controlling the flow of taxi traffic. There is also a website operated by Taxipoint, which shows the number of cars at the taxi station and the queuing area.\n\nWhen a car leaves the taxi station, a new car is free to enter.\n\n![map_image](/images/airport_map_3_lq.jpg)\n_Image: Google Maps_\n\n**A data scraper** script was built with Python to track the website. When a car left the taxi station, it was logged as one customer. The script also monitored the queue length at the queue area.\n\nData on the arriving flights was also collected every day through Finavia's own API.\n\nData on taxi rides and arriving flights was collected for six months. All data was stored in a PostgreSQL **database** on AWS RDS. Weather data from Ilmatieteen laitos was also incorporated in the training of the model and the predictions.\n\nThe next step was to build and train a prediction **model** to predict the number of taxi rides for the next 24 hours. After trying several methods, such as a seasonal naive model, SARIMA and a random forest model, a simple neural network model was chosen as it proved the most accurate.\n\nThe model had two hidden layers (512 + 256 units) and outputted a single value. The 24-hour prediction was made with a resolution of 15 minutes. A total of 97 models were trained each predicting one step further into the future.\n\nThe model took as input ride data from the previous 24 hours and the scheduled flight arrivals and forecasted weather of the following 24 hours. A Python script was running on AWS EC2 calculating a new **prediction** every 15 minutes and storing the prediction in the database.\n\nAnother script rendered an image of a **plot** visualizing the prediction. This image was stored on AWS S3.\n\nA **website** built was with Streamlit and hosted on Streamlit community cloud. It fetched the plot images from S3 and displayed them for the users.\n\n![diagram_image](/images/diagram.png)\n\n_Icons: Amazon, Streamlit_\n\n## What did I learn?\n- Deploying an end-to-end system on the cloud\n  - Learning to use EC2, RDS and S3\n  - Learning Linux and Cron\n  - Creating workarounds for memory limitations (+learning about memory leaks and garbage collection)\n  - Designing a simple system architecture\n- Learning about databases and PostgreSQL\n  - Datatypes\n  - Connecting in Python (Psycopg2, Sqlalchemy)\n- Real time data collection through APIs and web scraping\n  - Libraries (Requests, BeautifulSoup)\n  - XML data and namespaces\n  - Regular expressions for data scraping (Re)\n- Data analysis on time series data\n  - Group bys\n  - Seasonal decomposition\n- Classical time series prediction models\n  - Naive\n  - Seasonal naive\n  - Fast Fourier Transform\n  - ARMA\n  - ARIMA\n  - SARIMA\n- Basics of random forests\n- Basic use of the Savitzky–Golay filter\n  - Basically smoothing data by fitting low degree polynomials without losing the original shape\n- Time series prediction with a neural network\n  - Tensorflow basics (input and output layers, dense layers, relu activation, different optimizers)\n  - Utilizing Tensorboard to optimize model architecture and tune hyperparameters (number of layers and cells, learning rate, batch size)\n  - Creating the dataset using a sliding window technique\n  - Feature engineering (for example lag features and rolling statistics)\n  - Min-max scaling\n  - Utilizing vectorized operations to prepare data efficiently (for example pandas.DataFrame.shift)\n- Data visualization\n  - Plotly\n  - Automatic peak detection and coloring\n  - Design choices, for example focusing more on precise time than precise values by showing only vertical grid lines\n- Dividing code into functions and seperate files\n- Working with real data and real users\n  - Fixing issues and bugs on the go\n  - Making the system robust to changes in data sources (for example by utilizing previous predictions if unable to create new ones)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyrypp%2Ftaxi_demand_prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpyrypp%2Ftaxi_demand_prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyrypp%2Ftaxi_demand_prediction/lists"}