https://github.com/doridoro/book_to_scrape
Project 2 of Openclassrooms Path - Book to Scrape -- extract certain information of http://books.toscrape.com/index.html into a csv file
https://github.com/doridoro/book_to_scrape
console-tool csv-export python3 webscraping
Last synced: about 1 year ago
JSON representation
Project 2 of Openclassrooms Path - Book to Scrape -- extract certain information of http://books.toscrape.com/index.html into a csv file
- Host: GitHub
- URL: https://github.com/doridoro/book_to_scrape
- Owner: DoriDoro
- Created: 2022-12-03T18:43:33.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-03-09T09:16:59.000Z (over 2 years ago)
- Last Synced: 2025-02-13T12:18:48.771Z (over 1 year ago)
- Topics: console-tool, csv-export, python3, webscraping
- Language: Python
- Homepage:
- Size: 112 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Book to Scrape
## Description:
Project 2 of OpenClassrooms Path: Developer Python - Book to Scrape
-- extract certain information of http://books.toscrape.com into a csv file:
- product_page_url
- universal_ product_code (upc)
- title
- price_including_tax
- price_excluding_tax
- number_available
- product_description
- category
- review_rating
- image_url
These information should be extracted for each single book. Organised in the category on the website.
**An improved version (2.0) is available in branch `dev/version-2.0`**
## Installation:
open terminal
1. `git clone https://github.com/DoriDoro/Book_to_Scrape.git`
2. `cd Book_to_Scrape`
3. `python -m venv venv`
4. `. venv/bin/activate` (on MacOS/Linux) `venv\Scripts\activate` (on Windows)
5. `pip install -r requirements.txt`
## Skills:
- Configuring a Python environment
- Managing data using the ETL process
- Using version control with Git and GitHub
- Applying the basics of Python programming
## Visualisation of the project:
start the program with `python3 main.py`
in terminal you will see:

the results are visible in following folders:
1. the results of the images:

2. the csv files:
