https://github.com/doridoro/book_to_scrape

Project 2 of Openclassrooms Path - Book to Scrape -- extract certain information of http://books.toscrape.com/index.html into a csv file
https://github.com/doridoro/book_to_scrape

console-tool csv-export python3 webscraping

Last synced: about 1 year ago
JSON representation

Project 2 of Openclassrooms Path - Book to Scrape -- extract certain information of http://books.toscrape.com/index.html into a csv file

Host: GitHub
URL: https://github.com/doridoro/book_to_scrape
Owner: DoriDoro
Created: 2022-12-03T18:43:33.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2024-03-09T09:16:59.000Z (over 2 years ago)
Last Synced: 2025-02-13T12:18:48.771Z (over 1 year ago)
Topics: console-tool, csv-export, python3, webscraping
Language: Python
Homepage:
Size: 112 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Book to Scrape

## Description:
Project 2 of OpenClassrooms Path: Developer Python - Book to Scrape
-- extract certain information of http://books.toscrape.com into a csv file:

- product_page_url
- universal_ product_code (upc)
- title
- price_including_tax
- price_excluding_tax
- number_available
- product_description
- category
- review_rating
- image_url

These information should be extracted for each single book. Organised in the category on the website.

**An improved version (2.0) is available in branch `dev/version-2.0`**

## Installation:
open terminal
1. `git clone https://github.com/DoriDoro/Book_to_Scrape.git`
2. `cd Book_to_Scrape`
3. `python -m venv venv`
4. `. venv/bin/activate` (on MacOS/Linux) `venv\Scripts\activate` (on Windows)
5. `pip install -r requirements.txt`

## Skills:
- Configuring a Python environment
- Managing data using the ETL process
- Using version control with Git and GitHub
- Applying the basics of Python programming

## Visualisation of the project:
start the program with `python3 main.py`

in terminal you will see:

![terminal](/images_Readme/Terminal.png)

the results are visible in following folders:
1. the results of the images:

![images](/images_Readme/ResultsImages.png)
2. the csv files:

![csv](/images_Readme/ResultsResults.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/doridoro/book_to_scrape

Awesome Lists containing this project

README