https://github.com/petrglad/interview-webmon
Website monitoring tool (Python, Kafka, PostreSQL)
https://github.com/petrglad/interview-webmon
async did-not-land-a-job http kafka monitor postgresql python web
Last synced: 5 months ago
JSON representation
Website monitoring tool (Python, Kafka, PostreSQL)
- Host: GitHub
- URL: https://github.com/petrglad/interview-webmon
- Owner: PetrGlad
- License: apache-2.0
- Created: 2020-04-18T12:45:35.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-05-04T20:44:55.000Z (almost 6 years ago)
- Last Synced: 2025-01-24T05:07:16.431Z (about 1 year ago)
- Topics: async, did-not-land-a-job, http, kafka, monitor, postgresql, python, web
- Language: Python
- Size: 23.4 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Web sites monitor
Get HTTP statuses and match response results with given regexps.
Result is then piped to Kafka, and then stored from Kafka into a PostgreSQL database.
The collected data is HTTP response code, match/no match flag for body regex,
the time it took the web server to respond, and timestamp of the test.
## Configuration
All configuration files are in `./config` directory.
Review `config/config.toml` before building the container.
The required files' layout is
```
config
├── config.toml # The configuration file, look inside for hints
├── keys
│ ├── kafka
│ │ ├── ca.pem # kafka server CA
│ │ ├── service.cert # kafka server TLS cert
│ │ └── service.key # user key
│ └── pg
│ ├── ca.pem # service CA
│ └── pg.key # contains password on the first line, whitespace is trimmed
└── sites.csv # Path to list of sites to test
```
`sites.csv` should contain `URL,regexp` pairs. Regexp is matched against
URL's returned content.
### Setup with Aiven cloud
To get keys from Aiven cloud use `avn` command for example
```
python3 -m pip install aiven-client
avn user login YOUR_EMAIL_HERE
avn service user-creds-download --username avnadmin kafka-1 -d config/keys/kafka/
cp config/keys/kafka/ca.pem config/keys/pg/
```
Be sure to double check user name in `user-creds-download` command
as it does not verify whether user with that name exist and may fail silently.
Unfortunately for Postgres it is not supported. So you have to copy database
user password from database's service admin page into `config/keys/pg/pk.key`.
## Build and run
To run in a Docker container add user and service keys to `config/keys`
as described above then run `./run.sh` to launch it, or `./test.sh`
to do integration tests.
If you run directly on your machine install python prerequisites
and `libpq-dev` system package, e.g.:
```bash
sudo apt install libpq-dev
pip install -r requirements.txt
```
## Implementation notes
Query and storage procedures are in same process, Ishlud be straightforward
to separate them into different containers by splitting main function.
I never used async/await in Python before so decided that it would be an
interesting experiment. While checking lots of URLs it seems that most of the
time would be spent waiting for results (the problem is io bound).
`aiokafka` library provides nicer interface for async/await but it requires
older version of kafka-python. I could not make it work with AdminClient on time.
So kafka-python 2.x is used.