https://github.com/parisneo/scrapemaster

MultiScraper is a comprehensive Python library for web scraping that handles both simple and complex websites, offering features like text and image extraction, session management, and anti-bot circumvention techniques.
https://github.com/parisneo/scrapemaster

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/parisneo/scrapemaster
Owner: ParisNeo
License: apache-2.0
Created: 2024-07-20T17:57:37.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-10-07T21:29:22.000Z (almost 2 years ago)
Last Synced: 2025-03-26T04:23:05.063Z (over 1 year ago)
Language: Python
Size: 18.6 KB
Stars: 9
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ScrapeMaster

ScrapeMaster is a comprehensive Python library for web scraping that handles both simple and complex websites, offering features like text and image extraction, session management, and anti-bot circumvention techniques.

## Features

- Scrape text and images from websites
- Handle JavaScript-rendered content using Selenium
- Manage cookies and sessions for authenticated scraping
- Rotate user agents and use proxies to avoid detection
- Clean and format extracted data

## Installation

You can install ScrapeMaster using pip:

```
pip install ScrapeMaster
```

## Quick Start

Here's a simple example of how to use ScrapeMaster:

```python
from scrapemaster import ScrapeMaster

scraper = ScrapeMaster('https://example.com')
results = scraper.scrape_all('p', 'img', 'output_images')
print(results['texts'])
print(results['image_urls'])
```

## Advanced Usage

For more advanced usage, including handling of JavaScript-rendered content and authenticated scraping, please refer to the documentation.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/parisneo/scrapemaster

Awesome Lists containing this project

README