Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gayanukabulegoda/web-scraping-starter-kit

Repository designed to help freshers easily grasp the basics of web scripting, offering simple guides and examples to build a strong foundation.
https://github.com/gayanukabulegoda/web-scraping-starter-kit

python python-web-scraper python3 scrape scraping scraping-python web-scraper web-scraping web-scraping-python web-scraping-tutorials web-scrapping

Last synced: about 2 months ago
JSON representation

Repository designed to help freshers easily grasp the basics of web scripting, offering simple guides and examples to build a strong foundation.

Awesome Lists containing this project

README

        

# Web-Scraping-Starter-Kit

A repository designed to help freshers grasp the basics of web scraping. This kit provides simple guides and examples to build a strong foundation in web scraping.

## Repository Contents

This repository includes four essential Python scripts for web scraping:

1. **`Web.py`**
This script introduces the basics of web scraping. It captures and prints data from a website to the terminal.

2. **`WebDataToExcel.py`**
This script extracts data from a website and saves it to an Excel sheet, with two columns: Heading and Content.

3. **`WebImgToFolder.py`**
This script retrieves image source paths via web scraping and downloads the images, saving them to a specified folder.

4. **`PaginatedDataSetToExcel.py`**
This script scrapes data from a paginated site and saves it to an Excel sheet with seven separate columns, organized page by page.

## How to Use

1. **Clone the Repository**
```bash
git clone https://github.com/gayanukabulegoda/Web-Scraping-Starter-Kit.git

2. **Navigate to the Project Directory**
```bash
cd Web-Scraping-Starter-Kit

3. **Run the Scripts**
- **For `Web.py`:**
```bash
python Web.py
- **For `WebDataToExcel.py`:**
```bash
python WebDataToExcel.py
- **For `WebImgToFolder.py`:**
```bash
python WebImgToFolder.py
- **For `PaginatedDataSetToExcel.py`:**
```bash
python PaginatedDataSetToExcel.py

## Dependencies
Ensure you have the required Python libraries installed. You can install them using pip:
```bash
pip install requests beautifulsoup4 pandas
```

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Contact

For any questions or inquiries, please contact me via [LinkedIn](https://www.linkedin.com/in/gayanuka-bulegoda-2b993127a).

##








© 2024 Gayanuka Bulegoda