https://github.com/mishaa931/web-scraping-dot-report-and-saving-data-to-csv

A script to scrap/extract articles about customers from a website . The data is then preprocessed to store in the database for further analysis.
https://github.com/mishaa931/web-scraping-dot-report-and-saving-data-to-csv

datacollection pythonscript webscraping

Last synced: 4 months ago
JSON representation

A script to scrap/extract articles about customers from a website . The data is then preprocessed to store in the database for further analysis.

Host: GitHub
URL: https://github.com/mishaa931/web-scraping-dot-report-and-saving-data-to-csv
Owner: Mishaa931
Created: 2023-06-22T13:59:27.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-07-22T19:26:59.000Z (almost 2 years ago)
Last Synced: 2025-01-12T01:49:45.554Z (5 months ago)
Topics: datacollection, pythonscript, webscraping
Language: Python
Homepage:
Size: 2.93 KB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Web Scraping DOT Report and Saving Data to CSV

This is a Python script that uses the Selenium library to perform web scraping on the DOT Report website (https://dot.report/usdot/KS/Wichita). The script extracts information about various companies in Wichita, Kansas, and saves the data into a CSV file.

## Prerequisites

Before running the script, make sure you have the following installed:

1. Python (https://www.python.org/)
2. Chrome web browser (https://www.google.com/chrome/)
3. ChromeDriver (https://sites.google.com/a/chromium.org/chromedriver/downloads)
4. Selenium library (https://pypi.org/project/selenium/)
5. Pandas library (https://pypi.org/project/pandas/)
6. Webdriver Manager library (https://pypi.org/project/webdriver-manager/)

You can install the required libraries using `pip`:

```
pip install selenium
pip install pandas
pip install webdriver-manager
```

## How to Use

1. Clone or download the repository containing the script from GitHub.
2. Ensure that the ChromeDriver executable is in the system's PATH or provide the path explicitly in the `webdriver.Chrome()` call.
3. Run the Python script.

The script will launch a Chrome browser and navigate to the DOT Report website. It will collect links to various companies' detail pages and then extract data from each company's page, including Company Name, DOT Number, Address, Phone Number, City, State, and Zip. The scraped data will be saved to a CSV file in the same directory as the script.

Note: The script includes a delay using `time.sleep(5)` to ensure the pages load properly before extracting data. You may adjust this delay as needed.

## Disclaimer

This script is meant for educational and non-commercial purposes. Please check the terms of use and scraping policies of the website you intend to scrape before using this script. Web scraping may be subject to legal restrictions in some cases.

## Contact

If you have any questions or feedback, feel free to contact me or raise an issue in the GitHub repository. Happy scraping!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mishaa931/web-scraping-dot-report-and-saving-data-to-csv

Awesome Lists containing this project

README