Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/petrsevcik/scraping_template
Template for scraping
https://github.com/petrsevcik/scraping_template
Last synced: 6 days ago
JSON representation
Template for scraping
- Host: GitHub
- URL: https://github.com/petrsevcik/scraping_template
- Owner: petrsevcik
- Created: 2024-05-06T12:22:41.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-05-06T13:34:08.000Z (8 months ago)
- Last Synced: 2024-11-07T22:32:24.516Z (about 2 months ago)
- Language: Python
- Size: 2.93 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scraping practical task
## General Info
Scrape data from Notino website about toothpastes and transform them.
- use Python
- it is recommended to use file structure in this repository, but it is not mandatory. Up to your creativity.
- you can use any libraries you want. Output should be csv file.
- *clone this repository and after you are done with the task send link to your github or solution over to [email protected]*## File structure
`abstract/abstract_scraper.py` - Base class for scrapers.
- implement logger, sending get and post requests
- you can implement any other additional methods and features you consider useful`notino/sraper.py` - Scraper for Notino - raw data.
- Choose any language mutation of Notino and scrape toothpastes
- https://www.notino.cz/zubni-pasty/ // https://www.notino.co.uk/toothpaste/ // https://www.notino.de/zahnpasten/ ... Notino is present in 28 countries
- get info about products | mandatory: product name, brand, price, url, image
- any additional info is welcomed
- save result to csv file `notino_raw.csv``notino/transformation.py` - Transformation of raw data to final format
- add country (str), currency (str) and scraped_at (datetime) columns
- add discount amount column (difference between price and price before sale or promocode)
- save result to csv file `notino_transformed.csv`