An open API service indexing awesome lists of open source software.

https://github.com/anongecko/scrape


https://github.com/anongecko/scrape

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

          

# Web Scraper Extraordinaire

A powerful, flexible, and easy-to-use web scraping tool for extracting data from websites.

**Table of Contents**

1. [Features](#features)
2. [Getting Started](#getting-started)
3. [Installation](#installation)
4. [Usage](#usage)
5. [Examples](#examples)
6. [Contributing](#contributing)
7. [License](#license)

## Features

* **Modular Design**: Easily customize and extend the scraper with your own plugins and scripts.
* **Robust Handling**: Automatically handles anti-scraping measures, rate limiting, and connection issues.
* **Multi-Threaded**: Quickly scrape large amounts of data with our built-in multi-threading capabilities.
* **Data Processing**: Clean, transform, and format your scraped data with our integrated data processing pipeline.
* **Support for Multiple Data Formats**: Save your scraped data in CSV, JSON, XML, or any other format you need.

## Getting Started

To get started with the web scraper, follow these simple steps:

1. Install the scraper using pip: `pip install -r requirements.txt`
2. Create a new instance of the scraper: `scraper = WebScraper()`
3. Define your scraping task: `scraper.add_task(url, selector, handler)`
4. Run the scraper: `scraper.run()`

## Installation

To install the web scraper, run the following command:

```bash
pip install -r requirements.txt
```

This will install all necessary dependencies and libraries.

## Usage

Here's an example of how to use the web scraper:

```python
from web_scraper import WebScraper

# Create a new instance of the scraper
scraper = WebScraper()

# Define a scraping task
scraper.add_task(
url="https://example.com",
selector=".title",
handler=lambda x: x.text.strip()
)

# Run the scraper
scraper.run()
```

This will scrape the title from the webpage and print the result.

## Examples

* Scrape all links on a webpage:

```python
scraper.add_task(
url="https://example.com",
selector="a",
handler=lambda x: x.get("href")
)
```

* Scrape all images on a webpage:

```python
scraper.add_task(
url="https://example.com",
selector="img",
handler=lambda x: x.get("src")
)
```

* Scrape data from a table:

```python
scraper.add_task(
url="https://example.com",
selector="table tr",
handler=lambda x: [td.text.strip() for td in x.find_all("td")]
)
```

## Contributing

We welcome all contributions! If you'd like to contribute to the project, please fork the repository and submit a pull request.

## License

This project is licensed under the MIT License. See LICENSE for details.

By using this scraper, you acknowledge that you have read and agree to the terms of the license.