{"id":24017590,"url":"https://github.com/piraces/tomcat-logs-utilities","last_synced_at":"2025-11-18T14:35:47.780Z","repository":{"id":93937776,"uuid":"53976769","full_name":"piraces/Tomcat-Logs-Utilities","owner":"piraces","description":"A collection of Python scripts for parsing and applying heuristics in tomcat access logs, used in the Project \"Process Mining for Security\"","archived":false,"fork":false,"pushed_at":"2017-02-05T19:21:29.000Z","size":3740,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-25T20:43:29.133Z","etag":null,"topics":["csv","process-mining","python","tomcat-log"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/piraces.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-03-15T20:36:42.000Z","updated_at":"2023-10-26T13:00:38.000Z","dependencies_parsed_at":"2023-04-01T13:47:39.000Z","dependency_job_id":null,"html_url":"https://github.com/piraces/Tomcat-Logs-Utilities","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/piraces/Tomcat-Logs-Utilities","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piraces%2FTomcat-Logs-Utilities","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piraces%2FTomcat-Logs-Utilities/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piraces%2FTomcat-Logs-Utilities/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piraces%2FTomcat-Logs-Utilities/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/piraces","download_url":"https://codeload.github.com/piraces/Tomcat-Logs-Utilities/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piraces%2FTomcat-Logs-Utilities/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285068425,"owners_count":27109462,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-18T02:00:05.759Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","process-mining","python","tomcat-log"],"created_at":"2025-01-08T09:42:03.571Z","updated_at":"2025-11-18T14:35:47.763Z","avatar_url":"https://github.com/piraces.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Process Mining for Security - Tomcat Logs Utilities\n\nThis repository contains an extensible Python script, for parsing Apache Tomcat logs with several purposes.\n\nThis tool has been used for tests with [\"Process Mining for Security\" metodology](http://sid.cps.unizar.es/PMS/).\n\nThe related project can be found in this repository in PDF format [here](docs/TAZ-TFG-2016-2185.pdf).\n\n## Functions\n\nThis pre-processing tool, allows to do the following with Tomcat Web server (access) logs:\n  - Convert raw logs to csv, with custom or default header (obtained from Tomcat logging configuration).\n  - Apply a session heuristic to determine different sessions in given log, if these are not present.\n  - Apply multiple use cases heuristics to determine the possibles use cases given in a session.\n\nThe tool, does also this tasks in background:\n  - Search and remove bad characters from raw logs.\n  - Register (output to one file) \"strange\" events identified in the pre-processing of logs.\n  - Choosing and applying the desired delimiter for resultant CSV file.\n  - Extract statistics from the pre-processing process.\n\nThe main purpose of the tool is the validation of heuristics proposed in [\"Process Mining for Security\" metodology](http://sid.cps.unizar.es/PMS/), and for allowing to do several process mining tasks from raw Tomcat logs.\n\n## Important Notes\n\nNote that the use cases heuristics, only works on custom system logs, their have to be changed to pre-process diferent web information systems. To do these changes, you'll have to modify \"cases\" and \"paths\" arrays in cases_heuristic.py, to reflect the main behaviour of your web information system.\n\nAlso, if you want to use a custom header, you'll have to modify the global variables in parser.py.\n\n## Installation\n\nThis tool requires [Python 2.x](https://www.python.org) to run. It does not work with Python 3.\n\nIt does not require additional packages.\n\n## Execution\nExecution has to follow the format: python parser.py inputFile outputFile (default|custom) [heuristicName].\n\nAn example of use, could be the following:\n```sh\n$ python parser.py log.txt output.csv default h1_3\n```\n\nThe above example, will take \"log.txt\", pre-process it, apply specified heuristic (and session heuristic), and output the CSV needed for following tasks.\n\n## Available heuristics\n\n**The list of available heuristics is the following:** h1_0, h1_1, h1_2, h1_3, h2.\n\n- **H1_x** heuristics, checks the \"entrypoint\" of use cases to determine a new use case, and a list of common use case pages to check the current use case. Different versions are provided, with different levels of \"granularity\" (extra behaviors detected).\n- **H2** heuristic, checks the longest path possible of use cases to determine a new use case, and a list of common use case pages to check the current use case. Furthermore, it considers extra behaviors (from bots and other strange and correct cases).\n\n\nLicense\n----\n\nGPLv3\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpiraces%2Ftomcat-logs-utilities","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpiraces%2Ftomcat-logs-utilities","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpiraces%2Ftomcat-logs-utilities/lists"}