Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tadghw/pararius-scraper
Node.js script for filling a database with information about a subset of properties in various settlements in The Netherlands as well a script that analyse that data and a script for database management
https://github.com/tadghw/pararius-scraper
netherlands nodejs pararius rentals scraping
Last synced: about 1 month ago
JSON representation
Node.js script for filling a database with information about a subset of properties in various settlements in The Netherlands as well a script that analyse that data and a script for database management
- Host: GitHub
- URL: https://github.com/tadghw/pararius-scraper
- Owner: TadghW
- Created: 2024-02-09T20:31:42.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-02-28T14:51:31.000Z (10 months ago)
- Last Synced: 2024-02-28T16:00:31.749Z (10 months ago)
- Topics: netherlands, nodejs, pararius, rentals, scraping
- Language: JavaScript
- Homepage:
- Size: 41 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Pararius Scraper
This application uses Node.js and Puppeteer to scrape a list of settlements in the Netherlands for properties that fit your search parameters. It will find properties not yet in your database, gather information about them, and then upload them to a Firestore database.
It also contains a script to demonstrate how you can use that data to calculate price gradients between Dutch settlements and proxies for other information, like the competitiveness of a local housing market.
You could deploy the scraper in a cloud environment and run it using a chron job, but given the slow churn rate of Dutch properties you might as well run daily on your own computer.
Please be respectful of Pararius's bandwidth and compute, you will be hit with a captcha that breaks the script if you pull too much information at once. For that reason I've kept the application single-threaded to minimize the bandwidth/time rate of the application.
#### To use this application you will need:
- Node.js
- Npm or yarn
- A Firebase project with a Firestore database and collection configured with the schema demonstrated in `index.mjs`
- Valid credentials to access your database#### Setup:
- **Clone the repository** with `git clone https://github.com/TadghW/pararius-scaper.git`
- **Install the application's requirements** with `npm install` at the root of the project directory
- **Create and populate firebaseConfig.json** file at the root of the directory
- **Configure settings** in `config.mjs` to match your search criteria
- **Store your user credentials** JSON in a folder at the project's root#### Usage:
- Use any of the three scripts as you see fit
- `index.mjs` This is the script which scrapes pararius, filters the results and populates your database. No args required.
- `stats.mjs` This is the script that does the example processing. No args required.
- `clean_database.mjs` This script contains utilities to repair problems with data in your database, if you're fiddling with the script