https://github.com/rozek1997/otodom-scrapper
Web scrapper for otodom.pl
https://github.com/rozek1997/otodom-scrapper
beautifulsoup4 python3 scrapper
Last synced: 6 months ago
JSON representation
Web scrapper for otodom.pl
- Host: GitHub
- URL: https://github.com/rozek1997/otodom-scrapper
- Owner: rozek1997
- License: mit
- Created: 2019-10-23T23:06:26.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2021-06-02T00:31:53.000Z (over 4 years ago)
- Last Synced: 2025-03-30T16:03:16.762Z (7 months ago)
- Topics: beautifulsoup4, python3, scrapper
- Language: Python
- Size: 8.79 KB
- Stars: 5
- Watchers: 2
- Forks: 7
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Install package
In folder /src:
pip3 install -r requirements.txtRun scrapper
python3 main.py with arguments
Arguments:For help:
-h, --help show this help message and exitEssential arguments:
-p Podaj czy interesuje cie dom czy mieszkanie
-rt Podaj czy interesuje cie wynajem czy sprzedaz
-c Podaj misto ktore cie interesuje
Optional arguments:
-d Podaj dzielnice ktore cie interesuje
Example:
python3 main.py -p mieszkanie -rt wynajem -c warszawa -d bemowoAdditional information
Scrapper create dirs /img & /json collaterally to /src dir
In /img dir scrapper saves all photos
In /json dir scrapper saves info from Query in JSON format
How it works
Scraper downloads all pages from category based on user args input
If scrapper will not find any page the error will be thrown
Scraper goes throught all pages in specific category en route and gets single
offer then gets inside to subpage of that offer and mine additional information
Then it goes to the next offers.
Page by page
Scraper can run into problem when otodom.pl will block the queries for pages.
If than problem occurs scrapper waits 200 ms then it tries again.
Json files for each page are saved in current working dircetory in subdir ./json
Image are saved in current working directory in subdir ./img