{"id":21522817,"url":"https://github.com/petrglad/interview-webmon","last_synced_at":"2025-10-29T13:50:14.652Z","repository":{"id":136212100,"uuid":"256751584","full_name":"PetrGlad/interview-webmon","owner":"PetrGlad","description":"Website monitoring tool (Python, Kafka, PostreSQL)","archived":false,"fork":false,"pushed_at":"2020-05-04T20:44:55.000Z","size":24,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-24T05:07:16.431Z","etag":null,"topics":["async","did-not-land-a-job","http","kafka","monitor","postgresql","python","web"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PetrGlad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-04-18T12:45:35.000Z","updated_at":"2020-05-11T13:35:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"15abd4e6-83e2-4bc1-85ec-4afe0ffa62b8","html_url":"https://github.com/PetrGlad/interview-webmon","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PetrGlad%2Finterview-webmon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PetrGlad%2Finterview-webmon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PetrGlad%2Finterview-webmon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PetrGlad%2Finterview-webmon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PetrGlad","download_url":"https://codeload.github.com/PetrGlad/interview-webmon/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244080007,"owners_count":20394837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["async","did-not-land-a-job","http","kafka","monitor","postgresql","python","web"],"created_at":"2024-11-24T01:11:49.949Z","updated_at":"2025-10-29T13:50:09.613Z","avatar_url":"https://github.com/PetrGlad.png","language":"Python","readme":"# Web sites monitor\n\nGet HTTP statuses and match response results with given regexps.\nResult is then piped to Kafka, and then stored from Kafka into a PostgreSQL database.\n\nThe collected data is HTTP response code, match/no match flag for body regex,\nthe time it took the web server to respond, and timestamp of the test.\n\n\n## Configuration\n\nAll configuration files are in `./config` directory.\nReview `config/config.toml` before building the container.\nThe required files' layout is\n```\nconfig\n├── config.toml # The configuration file, look inside for hints\n├── keys\n│   ├── kafka \n│   │   ├── ca.pem # kafka server CA\n│   │   ├── service.cert # kafka server TLS cert\n│   │   └── service.key # user key\n│   └── pg \n│       ├── ca.pem # service CA\n│       └── pg.key # contains password on the first line, whitespace is trimmed\n└── sites.csv # Path to list of sites to test\n```\n`sites.csv` should contain `URL,regexp` pairs. Regexp is matched against\nURL's returned content.\n\n### Setup with Aiven cloud\n\nTo get keys from Aiven cloud use `avn` command for example\n```\npython3 -m pip install aiven-client\navn user login  YOUR_EMAIL_HERE\navn service user-creds-download --username avnadmin kafka-1 -d config/keys/kafka/\ncp config/keys/kafka/ca.pem config/keys/pg/\n```\nBe sure to double check user name in `user-creds-download` command \nas it does not verify whether user with that name exist and may fail silently.\n\nUnfortunately for Postgres it is not supported. So you have to copy database \nuser password from database's service admin page into `config/keys/pg/pk.key`.\n\n\n## Build and run\n\nTo run in a Docker container add user and service keys to `config/keys`\nas described above then run `./run.sh` to launch it, or `./test.sh`\nto do integration tests.\n\n\nIf you run directly on your machine install python prerequisites\nand `libpq-dev` system package, e.g.:\n```bash\nsudo apt install libpq-dev\npip install -r requirements.txt\n```\n\n\n## Implementation notes\n\nQuery and storage procedures are in same process, Ishlud be straightforward\nto separate them into different containers by splitting main function.\n\nI never used async/await in Python before so decided that it would be an\ninteresting experiment. While checking lots of URLs it seems that most of the\ntime would be spent waiting for results (the problem is io bound).\n\n`aiokafka` library provides nicer interface for async/await but it requires\nolder version of kafka-python. I could not make it work with AdminClient on time.\nSo kafka-python 2.x is used.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpetrglad%2Finterview-webmon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpetrglad%2Finterview-webmon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpetrglad%2Finterview-webmon/lists"}