https://github.com/vishal815/flight_details_scraper_and_visualization
This project focuses on scraping flight details from Google Flights, processing the data, and performing cleaning and visualization for future use in analytics or predictive modeling.
https://github.com/vishal815/flight_details_scraper_and_visualization
beautifulsoup end-to-end-project flight-data flight-data-analysis google-flights google-flights-scraper pandas pyhon scraping-project scraping-python selenium vishal-lazrus vishallazrus visualization webscraping webscraping-data
Last synced: about 2 months ago
JSON representation
This project focuses on scraping flight details from Google Flights, processing the data, and performing cleaning and visualization for future use in analytics or predictive modeling.
- Host: GitHub
- URL: https://github.com/vishal815/flight_details_scraper_and_visualization
- Owner: vishal815
- License: mit
- Created: 2025-01-15T18:00:53.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-01-21T00:11:33.000Z (4 months ago)
- Last Synced: 2025-01-30T20:57:02.911Z (4 months ago)
- Topics: beautifulsoup, end-to-end-project, flight-data, flight-data-analysis, google-flights, google-flights-scraper, pandas, pyhon, scraping-project, scraping-python, selenium, vishal-lazrus, vishallazrus, visualization, webscraping, webscraping-data
- Language: Jupyter Notebook
- Homepage: https://github.com/vishal815/Flight_Details_Scraper_and_Visualization/blob/main/scrape_flight_details_and_visualization.ipynb
- Size: 325 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Flight Data Scraper and Visualization
## Overview
This project focuses on scraping flight details from [Google Flights](https://www.google.com/travel/flights?gl=IN&hl=en), processing the data, and performing cleaning and visualization for future use in analytics or predictive modeling.## Workflow step-by-step
## Features
- **Web Scraping**: Extracts detailed flight information, including:
- Airline names
- Flight duration
- Price
- Departure and arrival times
- Departure and arrival dates
- Airports
- Stops
- CO2 emissions
- **Data Cleaning**: Processes and filters scraped data to handle missing or unavailable values.
- **CSV Outputs**: Saves data in various stages for easy access and further use.)
## Website
Data is scraped from [Google Flights](https://www.google.com/travel/flights?gl=IN&hl=en).
## Generated Files
1. `google_flights_data.csv`: Raw flight data scraped from the website.
2. `price_unavailable_data.csv`: Records where price information is unavailable.
3. `cleaned_google_flights_data.csv`: Cleaned and processed dataset ready for analysis.## Technologies Used
- **Python**
- **Selenium**: For browser automation and interaction.
- **BeautifulSoup**: For parsing HTML and extracting data.
- **Pandas**: For data manipulation and cleaning.## Steps to Run
1. **Install Dependencies**:
```bash
pip install selenium beautifulsoup4 pandas
```
2. **Set Up ChromeDriver**:
- Download the appropriate version of ChromeDriver from [here](https://chromedriver.chromium.org/downloads).
- Update the path to the driver in the script.
3. **Run the Script**:
```bash
python scrape_flight_details_and_visualization.ipynb
```
4. **Check Outputs**:
- The scraped data will be saved as CSV files in the working directory.## Data Flow
1. **Scraping**:
- Selenium navigates to the specified Google Flights URL.
- BeautifulSoup parses the HTML to extract flight details.
2. **Data Cleaning**:
- Processes columns like "Price" and "Arrival Time."
- Saves filtered data to `cleaned_google_flights_data.csv`.## Example Output
| Airline | Flight Duration | Price | Departure Time | Departure Date | Departure Airport | Arrival Time | Arrival Date | Arrival Airport | Stops | CO2 Emissions | Next Day Dispatcher |
|---------|-----------------|----------|----------------|----------------|-----------------------------------------------|--------------|--------------|---------------------------|--------|---------------|---------------------|
| IndiGo | 11 hr 30 min | ₹27,515 | 8:45 AM | Sun, Feb 2 | Jayprakash Narayan International Airport, Patna | 6:45 PM | Sun, Feb 2 | Zayed International Airport | 1 stop | 266 kg CO2e | 0 |
| IndiGo | 11 hr 15 min | ₹28,349 | 12:40 PM | Sun, Feb 2 | Jayprakash Narayan International Airport, Patna | 10:25 PM | Sun, Feb 2 | Zayed International Airport | 1 stop | 257 kg CO2e | 0 |## some output of visualization

## Future Work
- Autometion using crontab linux.
- Collect more data for extended analysis.
- Implement advanced visualization techniques.
- Apply ML/DL models for predictive analytics, such as estimating missing flight prices.## License
This project is licensed under the MIT License. See the LICENSE file for more details.---
*For questions or contributions, feel free to open an issue or submit a pull request.*
## Vishal Lazrus