https://github.com/wickenico/py-html-to-csv-converter

A Python tool for effortlessly converting HTML source code to CSV format with Selenium Webdriver.
https://github.com/wickenico/py-html-to-csv-converter

csv-converter csv-parser html python3 selenium-python selenium-webdriver

Last synced: 5 months ago
JSON representation

A Python tool for effortlessly converting HTML source code to CSV format with Selenium Webdriver.

Host: GitHub
URL: https://github.com/wickenico/py-html-to-csv-converter
Owner: wickenico
License: mit
Created: 2024-01-13T23:47:23.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-01-14T23:36:09.000Z (over 1 year ago)
Last Synced: 2025-01-09T11:44:57.422Z (6 months ago)
Topics: csv-converter, csv-parser, html, python3, selenium-python, selenium-webdriver
Language: Python
Homepage:
Size: 1.98 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Py-Html-to-csv-converter

This repository contains a Python script for web scraping using Selenium an convert HTML code to csv. The script is designed to be adaptable for various scraping tasks on websites with dynamic content.

## Script Structure

- scraper.py: The main Python script containing the web scraping functionality.
- main.py: Program to call and pass the parameters.
Call with:

```
python3 main.py
```

- requirements.txt: List of Python libraries required for the script.

## Getting Started

### Prerequisites

- [Python](https://www.python.org/) installed
- [Selenium](https://www.selenium.dev/) library installed (`pip install selenium`)
- Webdriver (e.g., [ChromeDriver](https://sites.google.com/chromium.org/driver/)) installed and its path set in the script

## Install

1. Clone this repository:
```
git clone https://github.com/wickenico/py-html-to-csv-converter.git
```
2. Install the requirements:
```
pip install -r requirements.txt
```
3. Download the [ChromeDriver](https://sites.google.com/a/chromium.org/chromedriver/downloads) that matches your Chrome version and put it in your PATH.

## Usage

### Script setup

Open scraper.py in your preferred text editor and update the following:

- Web Driver: Set the path to your preferred web driver (e.g., ChromeDriver) in the script.
- CSS Selectors: Customize the CSS selectors in the script to match the structure of the target website. Adjust the selectors used for button clicks, content extraction, and link identification.
- Output Filename: Optionally, change the output filename in the navigate_and_go_back function if needed.

### Output

- The scraped data will be stored in a CSV file named output.csv. Open this file using a spreadsheet application like Excel or Google Sheets for further analysis.
- If you encounter any issues or have suggestions for improvement, please create a [Pull Request](https://github.com/wickenico/py-html-to-csv-converter/pulls). Your feedback is valuable!

## Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or create a pull request.

## LICENSE

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wickenico/py-html-to-csv-converter

Awesome Lists containing this project

README