Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wickenico/py-html-to-csv-converter
A Python tool for effortlessly converting HTML source code to CSV format with Selenium Webdriver.
https://github.com/wickenico/py-html-to-csv-converter
csv-converter csv-parser html python3 selenium-python selenium-webdriver
Last synced: 23 days ago
JSON representation
A Python tool for effortlessly converting HTML source code to CSV format with Selenium Webdriver.
- Host: GitHub
- URL: https://github.com/wickenico/py-html-to-csv-converter
- Owner: wickenico
- License: mit
- Created: 2024-01-13T23:47:23.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-14T23:36:09.000Z (about 1 year ago)
- Last Synced: 2024-11-11T11:36:01.961Z (3 months ago)
- Topics: csv-converter, csv-parser, html, python3, selenium-python, selenium-webdriver
- Language: Python
- Homepage:
- Size: 1.98 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Py-Html-to-csv-converter
This repository contains a Python script for web scraping using Selenium an convert HTML code to csv. The script is designed to be adaptable for various scraping tasks on websites with dynamic content.
## Script Structure
- scraper.py: The main Python script containing the web scraping functionality.
- main.py: Program to call and pass the parameters.
Call with:```
python3 main.py
```- requirements.txt: List of Python libraries required for the script.
## Getting Started
### Prerequisites
- [Python](https://www.python.org/) installed
- [Selenium](https://www.selenium.dev/) library installed (`pip install selenium`)
- Webdriver (e.g., [ChromeDriver](https://sites.google.com/chromium.org/driver/)) installed and its path set in the script## Install
1. Clone this repository:
```
git clone https://github.com/wickenico/py-html-to-csv-converter.git
```
2. Install the requirements:
```
pip install -r requirements.txt
```
3. Download the [ChromeDriver](https://sites.google.com/a/chromium.org/chromedriver/downloads) that matches your Chrome version and put it in your PATH.## Usage
### Script setup
Open scraper.py in your preferred text editor and update the following:
- Web Driver: Set the path to your preferred web driver (e.g., ChromeDriver) in the script.
- CSS Selectors: Customize the CSS selectors in the script to match the structure of the target website. Adjust the selectors used for button clicks, content extraction, and link identification.
- Output Filename: Optionally, change the output filename in the navigate_and_go_back function if needed.### Output
- The scraped data will be stored in a CSV file named output.csv. Open this file using a spreadsheet application like Excel or Google Sheets for further analysis.
- If you encounter any issues or have suggestions for improvement, please create a [Pull Request](https://github.com/wickenico/py-html-to-csv-converter/pulls). Your feedback is valuable!## Contributing
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or create a pull request.
## LICENSE
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)