https://github.com/euberdeveloper/pastauctions-vavato-scraper

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/euberdeveloper/pastauctions-vavato-scraper
Owner: euberdeveloper
Created: 2024-02-28T21:53:32.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-02-29T16:31:49.000Z (over 1 year ago)
Last Synced: 2024-02-29T18:42:58.028Z (over 1 year ago)
Language: Python
Size: 71.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# pastauctions-vavato-scraper
A web scraper to scrape the content of some car auctions from Vavato.

## How to use it

Notes: you will need python and pipenv installed in your system.

1. Clone the repository
2. Install the dependencies with `pipenv install`
3. Run the script with `pipenv run python main.py`

Some adjustemnts:
- You should change the destination folder `save_path_prefix`
- You can filter the "categories" of auctions from the variable `allowed_auctions_roots`
- You can change the request delay in order to not be blocked because of too many requests by changing the variable `request_delay`
- In case a block happens, the seconds before retrying can be changed in the variable `retry_delay`. At every retry it gets doubled.

## What does it do

The script gets the auctions information and for each auction it gets the urls to the cars in the lots. Everything is divided into archived auctions and current/future actions. The result is an excel file with four sheets, one for the auctions and another for the car lots, for both archived and new auctions.

In `example_result` some example files are available.

## More technical notes

The script uses normal http requests to navigate the website and get the information. This is much faster than using for example Selenium. The websites returns content that is already rendered and does not use AJAX to load the content. This makes it possible to get the content of the pages with requests.

In particular, each page has in the end a tag `` that contains the information of the page. This is the information that is used to get the number of pages and the content for each page.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/euberdeveloper/pastauctions-vavato-scraper

Awesome Lists containing this project

README