Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aegrah/webscraper
https://github.com/aegrah/webscraper
Last synced: 6 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/aegrah/webscraper
- Owner: Aegrah
- Created: 2021-12-26T11:50:27.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-30T15:29:57.000Z (8 months ago)
- Last Synced: 2024-10-29T21:25:05.312Z (about 2 months ago)
- Language: Python
- Size: 26.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# webscraper
This web scraper is created as a hobby project to learn a bit of Python.The scraper is currently configured to scrape the Dutch Funda property advertisement website for new properties. Once it finds a property, an e-mail notification is sent out to the user with any interesting information regarding the property.
Can be hosted on a VPS or hosting platform such as pythonanywhere or Heroku in order to run infinitely.
To do's:
- Telegram bot to push notifications
- Evade robot checks:
- ~~Set randomized sleep timers~~
- ~~Set randomized user-agents~~
- ~~Set scraping times at working hours (8 PM - 7 PM)~~
- Add user options:
- Allow users to specify URL
- Set user properties (type of house, price, neighbourhood etc.)Added randomized sleeptimers and user agents. Also added a line to break the program after 12 hours. To execute every day from 6 AM, run the script using a cronjob:
```
0 6 * * * /home/aegrah/development/webscraper/main.py
```