Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nuhmanpk/Webtrench
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
https://github.com/nuhmanpk/Webtrench
audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets
Last synced: 3 months ago
JSON representation
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
- Host: GitHub
- URL: https://github.com/nuhmanpk/Webtrench
- Owner: nuhmanpk
- License: mit
- Created: 2023-02-11T17:47:54.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-19T07:55:20.000Z (12 months ago)
- Last Synced: 2024-07-16T06:22:46.368Z (4 months ago)
- Topics: audio-datasets, data, data-collection, data-science, dataset-generation, deep-learning, image-data-generator, machine-learning, python, scarper, text-datasets
- Language: Python
- Homepage: https://pypi.org/project/Webtrench/
- Size: 51.8 KB
- Stars: 20
- Watchers: 2
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
## Webtrench
WebTrench provides a comprehensive and powerful toolkit for web scraping. Whether you're working on a machine learning project, conducting research, or simply need to gather data from the web, WebTrench is the perfect tool for the job. So why wait? Start using WebTrench today and streamline your data collection process!
```python
pip install Webtrench
```
-----
### Check Documentation [Here](https://github.com/nuhmanpk/Webtrench/wiki)
------
[![Downloads](https://static.pepy.tech/personalized-badge/webtrench?period=total&units=international_system&left_color=grey&right_color=yellow&left_text=Total-Downloads)](https://pepy.tech/project/webtrench)
![PyPI - Format](https://img.shields.io/pypi/format/Webtrench)
[![GitHub license](https://img.shields.io/github/license/nuhmanpk/webtrench.svg)](https://github.com/nuhmanpk/webtrench/blob/main/LICENSE)
[![Upload Python Package](https://github.com/nuhmanpk/Webtrench/actions/workflows/python-publish.yml/badge.svg)](https://github.com/nuhmanpk/Webtrench/actions/workflows/python-publish.yml)
[![Supported Versions](https://img.shields.io/pypi/pyversions/Webtrench.svg)](https://pypi.org/project/Webtrench)
![PyPI](https://img.shields.io/pypi/v/Webtrench)
[![Documentation Status](https://readthedocs.org/projects/webtrench/badge/?version=latest)](https://webtrench.readthedocs.io/en/latest/?badge=latest)
![PyPI - Downloads](https://img.shields.io/pypi/dm/Webtrench)
[![Downloads](https://static.pepy.tech/personalized-badge/Webtrench?period=week&units=international_system&left_color=grey&right_color=brightgreen&left_text=Downloads/Week)](https://pepy.tech/project/Webtrench)## Why WebTrench
Easy to use: With its simple and intuitive interface, WebTrench makes it easy to extract data from the web.
Comprehensive: WebTrench includes functions for extracting a wide range of data, from images to tables and beyond.
Fast and efficient: WebTrench is designed to be fast and efficient, so you can quickly gather the data you need.
Suitable for a variety of use cases: Whether you're working on a machine learning project, conducting research, or simply need to gather data from the web, WebTrench is a versatile tool that can meet your needs.
```python
from Webtrench import ImageScrapper
url = 'https://example.com'
folder_path = './images'
ImageScrapper.all_image_from_url(url, folder_path)
```
This code snippet downloads an image from the URL https://example.com/image.jpg and saves it in the ./images folder with a random number as the file name.
## Limitations of WebTrench
Depends on website structure: The success of web scraping with WebTrench depends on the structure of the website being scraped. If the website's structure changes, WebTrench may not work as expected.
Legal restrictions: There may be legal restrictions on the use of web scraping, so it's important to familiarize yourself with the laws in your jurisdiction before using WebTrench.## Privacy Policy
WebTrench respects the privacy of its users and is committed to protecting their data. We do not collect or store any personal information, and all data collected through the use of WebTrench is kept confidential.## Web Scraping Ethics
When using WebTrench or any other web scraping tool, it's important to follow ethical guidelines and avoid scraping websites without the owner's permission. This includes websites that explicitly prohibit scraping, as well as websites that contain sensitive or confidential information.## Legal Warning
The use of web scraping may be subject to legal restrictions, and the legality of web scraping depends on the jurisdiction in which it is being used. Before using WebTrench, it's important to familiarize yourself with the laws in your jurisdiction and ensure that your use of the tool complies with all applicable laws. WebTrench cannot be held responsible for any illegal use of the tool.## Contributing Guide
We welcome contributions from the community! If you are interested in contributing to the WebTrench project, here are some guidelines to get started:- Check the [issues](https://github.com/nuhmanpk/Webtrench/issues) page to see if there are any open bugs or features that you would like to work on.
- Fork the repository and make your changes in a separate branch.
- Once you have made your changes, submit a pull request for review.
- The project maintainers will review your pull request and provide feedback. If necessary, make any requested changes and resubmit your pull request.
- Once your pull request is approved and merged, you will become a contributor to the WebTrench project!### Project Clone Guide
If you would like to clone the WebTrench repository, follow these steps:
- Install Git on your computer.
- Open a terminal window and navigate to the directory where you would like to clone the repository.
- Run the following command:
```git clone https://github.com/nuhmanpk/WebTrench.git```
- The repository will be cloned to your computer, and you can now make changes to the code and contribute to the project.## Reminder
Please note that WebTrench is currently in the pre-release stage and is not yet finished. If you encounter any issues, please check the [issues](https://github.com/nuhmanpk/Webtrench/issues) page, or consider contributing to make a better version of WebTrench!