https://github.com/itsrummmy/books-to-scrape-webscraping-project
Fully scrape website acquiring specific bits of information.
https://github.com/itsrummmy/books-to-scrape-webscraping-project
beautifulsoup4 requests-library-python
Last synced: 3 months ago
JSON representation
Fully scrape website acquiring specific bits of information.
- Host: GitHub
- URL: https://github.com/itsrummmy/books-to-scrape-webscraping-project
- Owner: Itsrummmy
- Created: 2025-01-13T09:15:46.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-01-13T22:00:53.000Z (9 months ago)
- Last Synced: 2025-06-05T12:06:13.755Z (4 months ago)
- Topics: beautifulsoup4, requests-library-python
- Language: Python
- Homepage:
- Size: 118 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Webscraping Project: Books To Scrape
Table of Contents
## Project Overview
To develop a Python-based web scraping solution to extract comprehensive data for all books listed on the target website - [Books to Scrape](https://books.toscrape.com/index.html)
## Tools Used
* **Python**: For exploratory data analysis and modeling.
* **Libraries**:
* `pandas`
* `beautifulsoup`
* `requests`## Deliverables
1) Automated Data Retrieval
2) Develop a web scraping solution to extract and store the following data for all books on the website and store in a single csv file:
- `Book Name`
- `Book URL`
- `Stock Availability`
- `UPC`
- `Tax`
- `Number of Reviews`
- `Category Name`
- `Category Link`
- `Book Description`## Outcomes
Successfully extracted and compiled a comprehensive dataset of all 1000 books available on the target website, including key attributes such as name, price, availability, category, and detailed descriptions. Data successfully saved into csv file
## Conclusion
This project successfully demonstrated the ability to effectively scrape and extract comprehensive data from a target website using Python and the Beautiful Soup library.