Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/infinitode/pywebscrapr
An open-source Python web scraping tool. Supports both image scraping and text scraping.
https://github.com/infinitode/pywebscrapr
data data-collection data-science open-source pip scraping web-scraper
Last synced: 3 months ago
JSON representation
An open-source Python web scraping tool. Supports both image scraping and text scraping.
- Host: GitHub
- URL: https://github.com/infinitode/pywebscrapr
- Owner: Infinitode
- License: other
- Created: 2024-02-02T09:21:52.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-11-05T04:31:21.000Z (3 months ago)
- Last Synced: 2024-11-05T05:19:38.371Z (3 months ago)
- Topics: data, data-collection, data-science, open-source, pip, scraping, web-scraper
- Language: Python
- Homepage: https://infinitode.netlify.app
- Size: 14.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PyWebScrapr
![Python Version](https://img.shields.io/badge/python-3.12-blue.svg)
[![Code Size](https://img.shields.io/github/languages/code-size/infinitode/pywebscrapr)](https://github.com/infinitode/pywebscrapr)
![Downloads](https://pepy.tech/badge/pywebscrapr)
![License Compliance](https://img.shields.io/badge/license-compliance-brightgreen.svg)
![PyPI Version](https://img.shields.io/pypi/v/pywebscrapr)An open-source Python library for web scraping tasks. Includes support for both text and image scraping.
## Changes in 0.1.2
Changes in version 0.1.2:
- `min` and `max` width and height parameters can now be specified when working with image scraping, allowing you to quickly exclude smaller resolution images, or images that are extremely large and take up too much space.
- PyWebScrapr now uses BeautifulSoup4's `SoupStrainer`, making extracting content from webpages much faster.## Installation
You can install PyWebScrapr using pip:
```bash
pip install pywebscrapr
```## Supported Python Versions
PyWebScrapr supports the following Python versions:
- Python 3.6
- Python 3.7
- Python 3.8
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.12/Later (Preferred)Please ensure that you have one of these Python versions installed before using PyWebScrapr. PyWebScrapr may not work as expected on lower versions of Python than the supported.
## Features
- **Text Scraping**: Extract textual content from specified URLs.
- **Image Scraping**: Download images from specified URLs.*for a full list check out the [PyWebScrapr official documentation](https://infinitode-docs.gitbook.io/documentation/package-documentation/pywebscrapr-package-documentation).
## Usage
### Text Scraping
```python
from pywebscrapr import scrape_text# Specify links in a file or list
links_file = 'links.txt'
links_array = ['https://example.com/page1', 'https://example.com/page2']# Scrape text and save to the 'output.txt' file
scrape_text(links_file=links_file, links_array=links_array, output_file='output.txt')
```### Image Scraping
```python
from pywebscrapr import scrape_images# Specify links in a file or list
links_file = 'image_links.txt'
links_array = ['https://example.com/image1.jpg', 'https://example.com/image2.png']# Scrape images and save to the 'images' folder
scrape_images(links_file=links_file, links_array=links_array, save_folder='images')
```## Contributing
Contributions are welcome! If you encounter any issues, have suggestions, or want to contribute to PyWebScrapr, please open an issue or submit a pull request on [GitHub](https://github.com/infinitode/pywebscrapr).
## License
PyWebScrapr is released under the terms of the **MIT License (Modified)**. Please see the [LICENSE](https://github.com/infinitode/pywebscrapr/blob/main/LICENSE) file for the full text.
**Modified License Clause**
The modified license clause grants users the permission to make derivative works based on the PyWebScrapr software. However, it requires any substantial changes to the software to be clearly distinguished from the original work and distributed under a different name.