{"id":19034229,"url":"https://github.com/wenzel/web_monitor","last_synced_at":"2026-05-04T20:30:18.353Z","repository":{"id":80246759,"uuid":"50146627","full_name":"Wenzel/web_monitor","owner":"Wenzel","description":null,"archived":false,"fork":false,"pushed_at":"2016-01-25T11:59:13.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-02T05:13:07.670Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Wenzel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-01-22T00:33:13.000Z","updated_at":"2023-09-08T17:06:08.000Z","dependencies_parsed_at":"2023-04-20T02:24:31.095Z","dependency_job_id":null,"html_url":"https://github.com/Wenzel/web_monitor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wenzel%2Fweb_monitor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wenzel%2Fweb_monitor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wenzel%2Fweb_monitor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wenzel%2Fweb_monitor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Wenzel","download_url":"https://codeload.github.com/Wenzel/web_monitor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240088491,"owners_count":19746098,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T21:43:53.032Z","updated_at":"2026-05-04T20:30:18.190Z","avatar_url":"https://github.com/Wenzel.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# web_monitor\n\n# requirements\n\n- `Python 3.4`\n- `virtualenv 3`\n- `pip`\n\n# setup\n\n    virtualenv-3.4 venv\n    source venv/bin/activate\n    pip install -r requirements.txt\n\n# run\n\n    ./web_monitor.py\n\nThen go to `http://127.0.0.1:5000/`\n\n# design\n\n## Read a list of web pages and content from a configuration file\n\nThe configuration file `web_monitor.yaml` is parsed at application startup.\nit contains the following sample data :\n\n    interval: 10\n    sites:\n        id1:\n            url: 'https://www.f-secure.com/'\n            content: 'f-secure'\n            full_match: false\n\n- `interval` : number of seconds between 2 requests on a website\n- `sites`: hash describing the list of websites to be watched\n- `id1` : an short identifier for a given website\n- `url` : website's url to be tested\n- `content` : website's content that should be matched\n- `full_match` : boolean which will be used by the regex engine to switch between\n    `re.search` (partial match) or `re.match` (full match)\n\n## Periodically make an HTTP request to each site\n\nThe `monitor` function is periodically called, thank to\nan background job scheduled before `Flask` application starts.\n\n    sched = BackgroundScheduler()\n    sched.add_job(monitor, 'interval', seconds=config['interval'], args=[config])\n    sched.start()\n\nThe `monitor` function has to check that every website\nis available and matches the required content, by calling the\n`check_website` function.\n\nIt imports `multiprocessing` module to use the `ThreadPool` class,\nso that we can effiently execute multiple checks in parallel.\n\n    pool = ThreadPool(4)\n    results = pool.map(check_website, [x[1] for x in config['sites'].items()])\n\nThe results are printed on the log output, using `pformat` to prettify them.\n\n    logging.info(pprint.pformat(results))\n\nA `mutex` is used to ensure that when we update the global variable `last_status`,\nit won't be read by the Flask view code at the same time.\n\n## Verifies that the page content received from the server matches the content requirements\n\nThe following code checks that a webpage content received matches the content\ngiven in the configuration file :\n\n        if self.full_match:\n            match_func = re.match\n        else:\n            match_func = re.search\n        if match_func(r'{}'.format(self.content), r.text):\n            self.status['match'] = True\n        else:\n            self.status['match'] = False\n\n## Measures the time it took for the web server to complete the whole request\n\nThe following is responsible for measuring the time required to received\nthe HTTP response :\n\n        start = datetime.datetime.now()\n        r = requests.get(self.url, timeout=Site.TIMEOUT)\n        # force to download all content\n        r.content\n        end = datetime.datetime.now()\n        self.status['code'] = r.status_code\n        self.status['elapsed'] = end - start\n\n## Writes a log file that shows the progress of the periodic checks\n\nThe log file is handled with the standard python module `logging`.\nHere we configure the logger output on both `stdout` and `web_monitor.log` :\n\n    def init_logger():\n        logger = logging.getLogger()\n        # log on stdout\n        logger.addHandler(logging.StreamHandler())\n        # log on LOG_FILE\n        file_handler = logging.FileHandler(LOG_FILE)\n        logger.addHandler(file_handler)\n        logger.setLevel(LOG_LEVEL)\n\nAnd there we log the new website reported status into the log output, using\n`pprint` to have a more readable format :\n\n    # write new entry into log file\n    logging.info(pprint.pformat(check))\n\n## Implement a single-page HTTP server interface\n\nWe used `Flask` to build this web-server, since it's efficient and remains\nvery simple to understand.\n\nOur architecture is splitted into modules :\n\n    app/\n        mod_webmonitor/\n            controller.py\n            [view.py]\n            [model.py]\n        static/\n        templates/\n            webmonitor/\n                show.html\n\n\n## The checking period must be configurable via a command-line option\n\nThe module `docopt` has been used to easily define new command line parameters.\nHere the `-c=INTERVAL` swicth is defined :\n\n    \"\"\"\n    Usage:\n        web_monitor.py [options]\n\n    options:\n        -c=INTERVAL         Change check interval value\n        -h --help           Show this screen.\n        --version           Show version.\n    \"\"\"\n\nThe configuration is then overwritten after it has been read :\n\n    # overwrite config with cmdline values\n    check_interval_cmdline = cmdline['-c']\n    if check_interval_cmdline:\n        config['check'] = check_interval_cmdline\n\n## The log file must contain the checked URLs, their status and the response times\n\nThe following format is printed in the log file :\n\n    {'date': datetime.datetime(2016, 1, 22, 2, 7, 7, 953550),\n     'sites': [{'code': 200,\n                'config_site': {'content': 'f-secure',\n                                'full_match': False,\n                                'url': 'https://www.f-secure.com/'},\n                'elapsed': datetime.timedelta(0, 2, 445137),\n                'error': None,\n                'match': True,\n                'up': True}]}\n\n- `date` : contains the `datetime` just before we began to check websites availability.\n- `sites` : contains the status report for each website\n- `code` : corresponding HTTP status code\n- `config_site` :  a hash describe the website configuration\n- `elasped` the delta between the moment where we started the requested, and the moment when we received the full answer\n- `error` : an error describing a problem at application level that might have happened during the test (`SSLErrorè, `TimeoutError`, `ConnectionError`, ...)\n- `match` : if the content received and the string describing the content in the configuration file have matched\n- `up` : if the website has given a response, and therefore is up\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenzel%2Fweb_monitor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwenzel%2Fweb_monitor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenzel%2Fweb_monitor/lists"}