{"id":21203399,"url":"https://github.com/celebi-pkg/flight-analysis","last_synced_at":"2025-10-30T21:46:14.793Z","repository":{"id":154419092,"uuid":"626238956","full_name":"celebi-pkg/flight-analysis","owner":"celebi-pkg","description":"Python package to scrape flight data from Google Flights and analyzes prices. Can determine optimal flight from date, place, and price","archived":false,"fork":false,"pushed_at":"2024-10-27T07:29:09.000Z","size":77,"stargazers_count":159,"open_issues_count":14,"forks_count":43,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-05-17T20:05:33.911Z","etag":null,"topics":["data-science","google","pandas","planes","prediction","price-tracker","python"],"latest_commit_sha":null,"homepage":"https://kcelebi.github.io/flight-analysis/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/celebi-pkg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-11T04:25:23.000Z","updated_at":"2025-05-07T10:04:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"e7380af0-232b-4787-9959-0330c3e4d6f5","html_url":"https://github.com/celebi-pkg/flight-analysis","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/celebi-pkg/flight-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celebi-pkg%2Fflight-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celebi-pkg%2Fflight-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celebi-pkg%2Fflight-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celebi-pkg%2Fflight-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/celebi-pkg","download_url":"https://codeload.github.com/celebi-pkg/flight-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celebi-pkg%2Fflight-analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261060120,"owners_count":23103979,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","google","pandas","planes","prediction","price-tracker","python"],"created_at":"2024-11-20T20:23:41.170Z","updated_at":"2025-10-30T21:46:14.685Z","avatar_url":"https://github.com/celebi-pkg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![kcelebi](https://circleci.com/gh/celebi-pkg/flight-analysis.svg?style=svg)](https://circleci.com/gh/celebi-pkg/flight-analysis)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Live on PyPI](https://img.shields.io/badge/PyPI-1.2.0-brightgreen)](https://pypi.org/project/google-flight-analysis/)\n[![TestPyPI](https://img.shields.io/badge/PyPI-1.1.1--alpha.11-blue)](https://test.pypi.org/project/google-flight-analysis/1.1.1a11/)\n\n# Flight Analysis\n\nThis project provides tools and models for users to analyze, forecast, and collect data regarding flights and prices. There are currently many features in initial stages and in development. The current features (as of 5/25/2023) are:\n\n- Detailed scraping and querying tools for Google Flights\n- Ability to store data locally or to SQL tables\n- Base analytical tools/methods for price forecasting/summary\n\nThe features in development are:\n\n- Models to demonstrate ML techniques on forecasting\n- Querying of advanced features\n- API for access to previously collected data\n\n## Table of Contents\n- [Overview](#Overview)\n- [Usage](#usage)\n- [Updates \u0026 New Features](#updates-\u0026-new-features)\n- [Real Usage](#real-usage) 😄\n\n\n## Overview\n\nFlight price calculation can either use newly scraped data (scrapes upon running it) or cached data that reports a price-change confidence determined by a trained model. Currently, many features of this application are in development.\n\n## Usage\n\nThe web scraping tool is currently functional only for scraping round trip flights for a given origin, destination, and date range. It can be easily used in a script or a jupyter notebook.\n\nNote that the following packages are **absolutely required** as dependencies:\n- tqdm\n- selenium **(make sure to update your [ChromeDriver](https://chromedriver.chromium.org)!)**\n- pandas\n- numpy\n\nYou can easily install this by running either installing the Python package `google-flight-analysis`:\n\n\tpip install google-flight-analysis\n\nor forking/cloning this repository. Upon doing so, make sure to install the dependencies and update ChromeDriver to match your Google Chrome version.\n\n\tpip install -r requirements.txt\n\n\nThe main scraping function that makes up the backbone of most other functionalities is `Scrape()`. It serves also as a data object, preserving the flight information as well as meta-data from your query. For Python package users, import as follows:\n\n\tfrom google_flight_analysis.scrape import *\n\nFor GitHub repository cloners, import as follows from the root of the repository:\n\n\tfrom src.google_flight_analysis.scrape import *\n\t#---OR---#\n\timport sys\n\tsys.path.append('src/google_flight_analysis')\n\tfrom scrape import *\n\n\nHere is some quick starter code to accomplish the basic tasks. Find more in the [documentation](https://kcelebi.github.io/flight-analysis/).\n\n\t# Keep the dates in format YYYY-mm-dd\n\tresult = Scrape('JFK', 'IST', '2023-07-20', '2023-08-20') # obtain our scrape object, represents out query\n\tresult.type # This is in a round-trip format\n\tresult.origin # ['JFK', 'IST']\n\tresult.dest # ['IST', 'JFK']\n\tresult.dates # ['2023-07-20', '2023-08-20']\n\tprint(result) # get unqueried str representation\n\nA `Scrape` object represents a Google Flights query to be run. It maintains flights as a sequence of one or more one-way flights which have a origin, destination, and flight date. The above object for a round-trip flight from JFK to IST is a sequence of JFK --\u003e IST, then IST --\u003e JFK. We can obtain the data as follows:\n\n\tScrapeObjects(result) # runs selenium through ChromeDriver, modifies results in-place\n\tresult.data # returns pandas DF\n\tprint(result) # get queried representation of result\n\nYou can also scrape for one-way trips:\n\n\tresults = Scrape('JFK', 'IST', '2023-08-20')\n\tScrapeObjects(result)\n\tresult.data #see data\n\nYou can also scrape chain-trips, which are defined as a sequence of one-way flights that have no direct relation to each other, other than being in chronological order. \n\n\t# chain-trip format: origin, dest, date, origin, dest, date, ...\n\tresult = Scrape('JFK', 'IST', '2023-08-20', 'RDU', 'LGA', '2023-12-25', 'EWR', 'SFO', '2024-01-20')\n\tresult.type # chain-trip\n\tScrapeObjects(result)\n\tresult.data # see data\n\nYou can also scrape perfect-chains, which are defined as a sequence of one-way flights such that the destination of the previous flight is the origin of the next and the origin of the chain is the final destination of the chain (a cycle).\n\n\t# perfect-chain format: origin, date, origin, date, ..., first_origin\n\tresult = Scrape(\"JFK\", \"2023-09-20\", \"IST\", \"2023-09-25\", \"CDG\", \"2023-10-10\", \"LHR\", \"2023-11-01\", \"JFK\")\n\tresult.type # perfect-chain\n\tScrapeObjects(result)\n\tresult.data # see data\n\nYou can read more about the different type of trips in the documentation. Scrape objects can be added to one another to create larger queries. This is under the conditions:\n\n1. The objects being added are the same type of trip (one-way, round-trip, etc)\n2. The objects being added are either both unqueried or both queried\n\n## Updates \u0026 New Features\n\nPerforming a complete revamp of this package, including new addition to PyPI. Documentation is being updated frequently, contact for any questions.\n\n\n\u003c!--\n## Cache Data\n\nThe caching system for this application is mainly designed to make the loading of data more efficient. For the moment, this component of the application hasn't been designed well for the public to easily use so I would suggest that most people leave it alone, or fork the repository and modify some of the functions to create folders in the destinations that they would prefer. The key caching functions are:\n\n- `cache_data`\n- `load_cached`\n- `iterative_caching`\n- `clean_cache`\n- `cache_condition`\n- `check_cached`\n\nAll of these functions are clearly documented in the `scraping.py` file.\n--\u003e\n\u003c!--## To Do\n\n- [x] Scrape data and clean it\n- [x] Testing for scraping\n- [x] Add scraping docs\n- [ ] Split Airlines\n- [ ] Add day of week as a feature\n- [ ] Support for Day of booking!! (\"Delayed by x hr\")\n- [ ] Detail most common airports and automatically cache\n- [ ] Algorithm to check over multiple days and return summary\n- [x] Determine caching method: wait for request and cache? periodically cache?\n- [ ] Model for observing change in flight price\n\t- Predict how much it'll maybe change\n- [ ] UI for showing flights that are 'perfect' to constraint / flights that are close to constraints, etc\n- [ ] Caching/storing data, uses predictive model to estimate how good this is\n\n--\u003e\n## Real Usage\n\nHere are some great flights I was able to find and actually booked when planning my travel/vacations:\n\n- NYC ➡️ AMS (May 9), AMS ➡️ IST (May 12), IST ➡️ NYC (May 23) | Trip Total: $611 as of March 7, 2022\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcelebi-pkg%2Fflight-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcelebi-pkg%2Fflight-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcelebi-pkg%2Fflight-analysis/lists"}