https://github.com/dislick/beobachter
Beobachter (german: Observer) periodically scrapes web pages and extracts values.
https://github.com/dislick/beobachter
Last synced: about 1 month ago
JSON representation
Beobachter (german: Observer) periodically scrapes web pages and extracts values.
- Host: GitHub
- URL: https://github.com/dislick/beobachter
- Owner: dislick
- Created: 2019-11-27T16:04:10.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-05T02:22:08.000Z (over 3 years ago)
- Last Synced: 2025-02-23T21:35:34.898Z (over 1 year ago)
- Language: TypeScript
- Homepage:
- Size: 863 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Beobachter 👀
> Early alpha! Everything is subject to change and certain features are not implemented yet.
## About
This Node.js service performs highly configurable, periodic web scraping. Results can be stored in time-series databases for further analysis using visualization tools like [Grafana](https://grafana.com).
## Short Example
A configuration like this lets you download a certain web page every 60 seconds and run a JavaScript function on it to extract the price of an item. It uses the simple `adapter-console` to record values to standard output.
```jsonc
{
"adapters": [{ "name": "adapter-console", "options": { "colors": true } }],
"tasks": [
{
"type": "browser",
"name": "digitec-amd-ryzen-9-3950x",
"description": "Watches the prize of the new 16-core Ryzen 3950X",
"url": "https://www.digitec.ch/de/s1/product/amd-ryzen-9-3950x-am4-360ghz-16-core-prozessor-11239808",
"fn": "return parseInt(document.querySelector('meta[property=\"product:price:amount\"]').content, 10)",
"interval": 60
}
]
}
```
```
[2019-12-13T21:27:54] [digitec-amd-ryzen-9-3950x] [60s] 854
[2019-12-13T21:28:55] [digitec-amd-ryzen-9-3950x] [60s] 854
[2019-12-13T21:29:56] [digitec-amd-ryzen-9-3950x] [60s] 854
```
## Documentation
### Adapters
Adapters are required to record values. The following adapters are available/planned:
- `adapter-console` Logs to standard output. `[available]`
- `adapter-influxdb` Stores values in InfluxDB. `[planned]`
- `adapter-csv` Writes values to a CSV file. `[planned]`
### Configuration
There are multiple ways of configuring Beobachter.
1. `config.json` in the same directory.
2. Environment variable `BEOBACHTER_CONFIG` set to the absolute path of your `config.json`.
#### Adapters
Beobachter needs you to configure at least one adapter. Specify its `name` and `config` which you will find at `packages/adapter-*` in this repository.
```json
{
"adapters": [
{
"name": "adapter-console",
"config": {
"colors": true
}
}
]
}
```
#### Tasks
Tasks tell Beobachter what it needs to do. You can specify any number of tasks. There are different types of tasks:
- `browser`
- `http-json`
- `http-text`
> You must not use spaces in `name`. This constraint allows adapters to record data more predictably.
```jsonc
{
"tasks": [
{
"type": "browser",
"name": "github-vscode-stargazers-browser",
"description": "",
"url": "https://github.com/microsoft/vscode/stargazers",
"fn": "return parseInt(document.querySelector('#repos .Counter').innerText.replace(',', ''))",
"interval": 60 // seconds
},
{
"type": "http-json",
"name": "github-vscode-stargazers-json",
"description": "",
"url": "https://api.github.com/repos/microsoft/vscode",
"path": "stargazers_count",
"interval": 60 // seconds
}
]
}
```