https://github.com/piraces/tomcat-logs-utilities
A collection of Python scripts for parsing and applying heuristics in tomcat access logs, used in the Project "Process Mining for Security"
https://github.com/piraces/tomcat-logs-utilities
csv process-mining python tomcat-log
Last synced: 7 months ago
JSON representation
A collection of Python scripts for parsing and applying heuristics in tomcat access logs, used in the Project "Process Mining for Security"
- Host: GitHub
- URL: https://github.com/piraces/tomcat-logs-utilities
- Owner: piraces
- License: gpl-3.0
- Created: 2016-03-15T20:36:42.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2017-02-05T19:21:29.000Z (over 9 years ago)
- Last Synced: 2025-02-25T20:43:29.133Z (over 1 year ago)
- Topics: csv, process-mining, python, tomcat-log
- Language: Python
- Homepage:
- Size: 3.57 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Process Mining for Security - Tomcat Logs Utilities
This repository contains an extensible Python script, for parsing Apache Tomcat logs with several purposes.
This tool has been used for tests with ["Process Mining for Security" metodology](http://sid.cps.unizar.es/PMS/).
The related project can be found in this repository in PDF format [here](docs/TAZ-TFG-2016-2185.pdf).
## Functions
This pre-processing tool, allows to do the following with Tomcat Web server (access) logs:
- Convert raw logs to csv, with custom or default header (obtained from Tomcat logging configuration).
- Apply a session heuristic to determine different sessions in given log, if these are not present.
- Apply multiple use cases heuristics to determine the possibles use cases given in a session.
The tool, does also this tasks in background:
- Search and remove bad characters from raw logs.
- Register (output to one file) "strange" events identified in the pre-processing of logs.
- Choosing and applying the desired delimiter for resultant CSV file.
- Extract statistics from the pre-processing process.
The main purpose of the tool is the validation of heuristics proposed in ["Process Mining for Security" metodology](http://sid.cps.unizar.es/PMS/), and for allowing to do several process mining tasks from raw Tomcat logs.
## Important Notes
Note that the use cases heuristics, only works on custom system logs, their have to be changed to pre-process diferent web information systems. To do these changes, you'll have to modify "cases" and "paths" arrays in cases_heuristic.py, to reflect the main behaviour of your web information system.
Also, if you want to use a custom header, you'll have to modify the global variables in parser.py.
## Installation
This tool requires [Python 2.x](https://www.python.org) to run. It does not work with Python 3.
It does not require additional packages.
## Execution
Execution has to follow the format: python parser.py inputFile outputFile (default|custom) [heuristicName].
An example of use, could be the following:
```sh
$ python parser.py log.txt output.csv default h1_3
```
The above example, will take "log.txt", pre-process it, apply specified heuristic (and session heuristic), and output the CSV needed for following tasks.
## Available heuristics
**The list of available heuristics is the following:** h1_0, h1_1, h1_2, h1_3, h2.
- **H1_x** heuristics, checks the "entrypoint" of use cases to determine a new use case, and a list of common use case pages to check the current use case. Different versions are provided, with different levels of "granularity" (extra behaviors detected).
- **H2** heuristic, checks the longest path possible of use cases to determine a new use case, and a list of common use case pages to check the current use case. Furthermore, it considers extra behaviors (from bots and other strange and correct cases).
License
----
GPLv3