https://github.com/wenzel/web_monitor
https://github.com/wenzel/web_monitor
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/wenzel/web_monitor
- Owner: Wenzel
- Created: 2016-01-22T00:33:13.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-01-25T11:59:13.000Z (over 10 years ago)
- Last Synced: 2025-01-02T05:13:07.670Z (over 1 year ago)
- Language: Python
- Size: 24.4 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# web_monitor
# requirements
- `Python 3.4`
- `virtualenv 3`
- `pip`
# setup
virtualenv-3.4 venv
source venv/bin/activate
pip install -r requirements.txt
# run
./web_monitor.py
Then go to `http://127.0.0.1:5000/`
# design
## Read a list of web pages and content from a configuration file
The configuration file `web_monitor.yaml` is parsed at application startup.
it contains the following sample data :
interval: 10
sites:
id1:
url: 'https://www.f-secure.com/'
content: 'f-secure'
full_match: false
- `interval` : number of seconds between 2 requests on a website
- `sites`: hash describing the list of websites to be watched
- `id1` : an short identifier for a given website
- `url` : website's url to be tested
- `content` : website's content that should be matched
- `full_match` : boolean which will be used by the regex engine to switch between
`re.search` (partial match) or `re.match` (full match)
## Periodically make an HTTP request to each site
The `monitor` function is periodically called, thank to
an background job scheduled before `Flask` application starts.
sched = BackgroundScheduler()
sched.add_job(monitor, 'interval', seconds=config['interval'], args=[config])
sched.start()
The `monitor` function has to check that every website
is available and matches the required content, by calling the
`check_website` function.
It imports `multiprocessing` module to use the `ThreadPool` class,
so that we can effiently execute multiple checks in parallel.
pool = ThreadPool(4)
results = pool.map(check_website, [x[1] for x in config['sites'].items()])
The results are printed on the log output, using `pformat` to prettify them.
logging.info(pprint.pformat(results))
A `mutex` is used to ensure that when we update the global variable `last_status`,
it won't be read by the Flask view code at the same time.
## Verifies that the page content received from the server matches the content requirements
The following code checks that a webpage content received matches the content
given in the configuration file :
if self.full_match:
match_func = re.match
else:
match_func = re.search
if match_func(r'{}'.format(self.content), r.text):
self.status['match'] = True
else:
self.status['match'] = False
## Measures the time it took for the web server to complete the whole request
The following is responsible for measuring the time required to received
the HTTP response :
start = datetime.datetime.now()
r = requests.get(self.url, timeout=Site.TIMEOUT)
# force to download all content
r.content
end = datetime.datetime.now()
self.status['code'] = r.status_code
self.status['elapsed'] = end - start
## Writes a log file that shows the progress of the periodic checks
The log file is handled with the standard python module `logging`.
Here we configure the logger output on both `stdout` and `web_monitor.log` :
def init_logger():
logger = logging.getLogger()
# log on stdout
logger.addHandler(logging.StreamHandler())
# log on LOG_FILE
file_handler = logging.FileHandler(LOG_FILE)
logger.addHandler(file_handler)
logger.setLevel(LOG_LEVEL)
And there we log the new website reported status into the log output, using
`pprint` to have a more readable format :
# write new entry into log file
logging.info(pprint.pformat(check))
## Implement a single-page HTTP server interface
We used `Flask` to build this web-server, since it's efficient and remains
very simple to understand.
Our architecture is splitted into modules :
app/
mod_webmonitor/
controller.py
[view.py]
[model.py]
static/
templates/
webmonitor/
show.html
## The checking period must be configurable via a command-line option
The module `docopt` has been used to easily define new command line parameters.
Here the `-c=INTERVAL` swicth is defined :
"""
Usage:
web_monitor.py [options]
options:
-c=INTERVAL Change check interval value
-h --help Show this screen.
--version Show version.
"""
The configuration is then overwritten after it has been read :
# overwrite config with cmdline values
check_interval_cmdline = cmdline['-c']
if check_interval_cmdline:
config['check'] = check_interval_cmdline
## The log file must contain the checked URLs, their status and the response times
The following format is printed in the log file :
{'date': datetime.datetime(2016, 1, 22, 2, 7, 7, 953550),
'sites': [{'code': 200,
'config_site': {'content': 'f-secure',
'full_match': False,
'url': 'https://www.f-secure.com/'},
'elapsed': datetime.timedelta(0, 2, 445137),
'error': None,
'match': True,
'up': True}]}
- `date` : contains the `datetime` just before we began to check websites availability.
- `sites` : contains the status report for each website
- `code` : corresponding HTTP status code
- `config_site` : a hash describe the website configuration
- `elasped` the delta between the moment where we started the requested, and the moment when we received the full answer
- `error` : an error describing a problem at application level that might have happened during the test (`SSLErrorè, `TimeoutError`, `ConnectionError`, ...)
- `match` : if the content received and the string describing the content in the configuration file have matched
- `up` : if the website has given a response, and therefore is up