Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/parisneo/scrapemaster
MultiScraper is a comprehensive Python library for web scraping that handles both simple and complex websites, offering features like text and image extraction, session management, and anti-bot circumvention techniques.
https://github.com/parisneo/scrapemaster
Last synced: about 2 months ago
JSON representation
MultiScraper is a comprehensive Python library for web scraping that handles both simple and complex websites, offering features like text and image extraction, session management, and anti-bot circumvention techniques.
- Host: GitHub
- URL: https://github.com/parisneo/scrapemaster
- Owner: ParisNeo
- License: apache-2.0
- Created: 2024-07-20T17:57:37.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-07-27T00:09:45.000Z (5 months ago)
- Last Synced: 2024-08-31T23:37:13.361Z (4 months ago)
- Language: Python
- Size: 14.6 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ScrapeMaster
ScrapeMaster is a comprehensive Python library for web scraping that handles both simple and complex websites, offering features like text and image extraction, session management, and anti-bot circumvention techniques.
## Features
- Scrape text and images from websites
- Handle JavaScript-rendered content using Selenium
- Manage cookies and sessions for authenticated scraping
- Rotate user agents and use proxies to avoid detection
- Clean and format extracted data## Installation
You can install ScrapeMaster using pip:
```
pip install ScrapeMaster
```## Quick Start
Here's a simple example of how to use ScrapeMaster:
```python
from scrapemaster import ScrapeMasterscraper = ScrapeMaster('https://example.com')
results = scraper.scrape_all('p', 'img', 'output_images')
print(results['texts'])
print(results['image_urls'])
```## Advanced Usage
For more advanced usage, including handling of JavaScript-rendered content and authenticated scraping, please refer to the documentation.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.