Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nafisrayan/webscraper
A web scraping tool powered with AI
https://github.com/nafisrayan/webscraper
Last synced: about 5 hours ago
JSON representation
A web scraping tool powered with AI
- Host: GitHub
- URL: https://github.com/nafisrayan/webscraper
- Owner: NafisRayan
- Created: 2024-02-28T08:59:28.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-08-27T05:07:20.000Z (3 months ago)
- Last Synced: 2024-08-27T06:25:13.783Z (3 months ago)
- Language: Python
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WebAI
This Streamlit application offers a straightforward interface for web scraping. Users can enter a website URL, and the provided code utilizes BeautifulSoup and Pandas to extract and display various forms of content such as text, links, and images from the specified website.
## Getting Started
- To begin, ensure you have the necessary packages installed:
- `streamlit`
- `pandas`
- `requests`
- `beautifulsoup4`
- `pdfkit`- Input the desired website URL into the designated text field to initiate the scraping process.
## Purpose
The primary objective of this application is to illustrate the process of web scraping using Python libraries like BeautifulSoup for parsing web content and Pandas for organizing extracted data.## Functionality
### Input
Users are prompted to input the URL of the website they wish to scrape. The initial URL is set as 'https://bbc.com' for demonstration purposes.### Scraping Logic
Upon entering the URL and clicking the "Scrape Data" button, the application sends an HTTP request to the specified URL and proceeds to parse its content using BeautifulSoup. The scraping logic involves extracting paragraphs, anchors (links), and images from the website.### Displaying Data
The scraped data, including text, links, and images, are presented to the user in a tabular format, courtesy of Pandas.### PDF Generation
Following the data extraction, users have the option to download the scraped data as a PDF by clicking the "Download as PDF" button.## Implementation
The application is implemented using Streamlit, which provides a simple and intuitive method for creating web applications with Python.## Usage
1. Clone the provided code.
2. Install the necessary dependencies.
3. Run the application with Streamlit and input a desired website URL for scraping.
4. After scraping, the data can be downloaded as a PDF for further use.