{"id":17679990,"url":"https://github.com/costajob/apache_log_parser","last_synced_at":"2026-03-01T14:03:41.276Z","repository":{"id":77434077,"uuid":"62630221","full_name":"costajob/apache_log_parser","owner":"costajob","description":"Apache log parser by filters in Crystal.","archived":false,"fork":false,"pushed_at":"2017-04-04T07:24:50.000Z","size":97,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-12T22:55:19.088Z","etag":null,"topics":["apache","apache2","crystal","crystal-lang","http-logs","parser"],"latest_commit_sha":null,"homepage":null,"language":"Crystal","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/costajob.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-07-05T11:04:05.000Z","updated_at":"2022-10-12T23:33:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"d0a89caf-98d1-409e-b5d0-2f82794d928f","html_url":"https://github.com/costajob/apache_log_parser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/costajob/apache_log_parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Fapache_log_parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Fapache_log_parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Fapache_log_parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Fapache_log_parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/costajob","download_url":"https://codeload.github.com/costajob/apache_log_parser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/costajob%2Fapache_log_parser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29970544,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T13:32:00.443Z","status":"ssl_error","status_checked_at":"2026-03-01T13:32:00.084Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","apache2","crystal","crystal-lang","http-logs","parser"],"created_at":"2024-10-24T09:05:03.655Z","updated_at":"2026-03-01T14:03:41.254Z","avatar_url":"https://github.com/costajob.png","language":"Crystal","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Table of Contents\n* [Scope](#scope)\n  * [Alternatives](#alternatives)\n* [Log format](#log-format)\n  * [Captured fields](#captured-fields)\n* [Usage](#usage)\n  * [Help](#help)\n  * [Output](#output)\n  * [Filters](#filters)\n  * [CSV export](#csv-export)\n  * [Examples](#examples)\n* [Performance](#performance)\n  * [Platform](#platform)\n  * [Results](#results)\n  * [Considerations](#considerations)\n\n## Scope\nThe scope of this CLI program is to scan [Apache HTTPD](https://httpd.apache.org/) logs in order to highlight possible DDoS attacks and/or other type of events.\n\n### Alternatives\nSome alternatives exists:\n* heavyweight log [analysis tools](https://www.apacheviewer.com/), for which this CLI is not a replacement\n* pure command line scripts via [AWK](http://www.the-art-of-web.com/system/logs/), that i consider elegant, but a maintenance nightmare when filters start piling up\n\n## Log format\nThe assumed log format is the following:\n```log\n23.63.227.241 - - [03/Jul/2016:03:56:21 +0100] \"GET / HTTP/1.1\" 302 94 \"-\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko\" \"192.40.202.240\"\n```\nConsider you can modify the regex used to capture log file data by using an [environment variable](#custom-regex).\n\n### Captured fields\nThe following fields are captured from each line by using a named-group regex:\n* *time*: the log time\n* *request*: the HTTP request received by the server  \n* *status*: the HTTP status of the response\n* *user_agent*: the user agent data\n* *true_ip*: since most of the applications run behind a CDN, the True IP of the client\n\n## Usage\nInstall last Crystal version and clone the repository: \n```shell\ngit clone https://github.com/costajob/apache_log_parser.git\n```\nMove into the cloned repository and compile the main file to get the CLI program:\n```shell\ncrystal build --release src/apache_log_parser.cr\n```\nMove the resulting binary in your PATH.\n\n### Help\nOnce compiled, you can check program help by typing:\n```shell\napache_log_parser -h\nUsage: apache_log_parser -s /logs -f 2016-06-30T00:00:00+0100 -t 2016-07-04T00:00:00+0100 -i 66.249.66.63 -c 20* -r send_mail -a iphone\n    -s SRC, --src=SRC                Specify log files path [cwd]\n    -f FROM, --from=FROM             Filter requests from this time\n    -t TO, --to=TO                   Filter requests until this time\n    -c CODE, --code=CODE             Filter HTTP code by regex\n    -i IPS, --ips=IPS                Filter by list of true client IPs\n    -r REQUEST, --request=REQUEST    Filter HTTP request by regex\n    -a AGENT, --agent=AGENT          Filter user agent by regex\n    -h, --help                       Show this help\n```\n\n### Output\nThe CLI library starts scanning file by the specified source path (default to CWD). \nThe results are printed directly to STDOUT, displaying hits distributed on time and by true IP:\n```shell\napache_log_parser --src=\u003cpath_to_gz_logs\u003e\n\naccess_log.gz                  17        \n\n\nHOUR                           HITS      \n----------------------------------------\n2016-07-03 03h                 17\n\nTRUE IP                        HITS      \n----------------------------------------\n126.245.6.49                   3\n221.127.193.144                3\n182.139.30.248                 2\n211.157.178.224                1\n61.148.244.148                 1\n37.104.78.137                  1\n153.182.1.52                   1\n165.225.96.76                  1\n59.173.177.227                 1\n```\n\n#### Global report\nIf more than one log file is scanned, a global report is printed by collecting all of the parsed data.\n\n#### Highlight output\nDepending on the standard traffic of your server, you could want to highlight the results that are greater than a specified limit:\n```shell\nHIGHLIGHT=200000 apache_log_parser --src=\u003cpath_to_gz_logs\u003e\n```\n\n### Filters\nYou can refine results by combining different filters:\n* from time\n* to time\n* HTTP code by regex (can be negated)\n* list of true client IPs\n* HTTP request by regex (can be negated)\n* user agent by regex (can be negated)\n\n#### Limit data\nSince the list of true IP could be large you can limit the number printed data by using an environment variable:\n```shell\nLIMIT=10 apache_log_parser --src=\u003cpath_to_gz_logs\u003e\n``` \n\n#### Custom regex\nIn case you need to specify a custom regex to capture log data you can use an environment variable (remember to use the same group names):\n```shell\nREGEX=\"^(?\u003ctrue_client_ip\u003e(?:[0-9]{1,3}\\.){3}[0-9]{1,3}|-)\" apache_log_parser\n```\n\n### CSV export\nAt the end of the scanning you are asked to export a CSV with filtered data into the current folder (or by defining the EXPORT environment variable). \nRemember that the `user_agent`, `request` and `code` data are only captured if the relative filters have been specified.\nIt is also possible to skip asking by using an environment variable:\n```shell\nASK=n apache_log_parser\n```\n\n### Examples\n\n#### Detect bots\n```shell\napache_log_parser --src=\u003cpath_to_gz_logs\u003e --agent=\"[spring|google]bot\"\n``` \n\n#### Detect errors\n```shell\napache_log_parser --src=\u003cpath_to_gz_logs\u003e --code=50*\n```\n\n#### Detect specific HTTP verb\n```shell\napache_log_parser --src=\u003cpath_to_gz_logs\u003e --request=^post\n```\n\n#### Check specific time window\nYou can specify different time zones and they will be observed:\n```shell\napache_log_parser --src=\u003cpath_to_gz_logs\u003e --from=2016-07-03T04:10:13+0200 --to=2016-07-03T05:33:01+0400\n```\n\n#### Exclude specific results\nBy using the negation form `-` (available for status, request and user agent) is possible to filter by excluding matching results:\n```shell\napache_log_parser --src=\u003cpath_to_gz_logs\u003e --agent=-iphone\n```\n\n#### Combining filters\nYou can combine available filters for more granular data analysis.\n```shell\napache_log_parser --src=\u003cpath_to_gz_logs\u003e \\\n                  --from=2016-07-03T04:10:13+0100 \\\n                  --to=2016-07-03T05:33:01+0100 \\\n                  --code=302 \\\n                  --ips=66.249.66.63,61.148.244.148 \\\n                  --request=jpg \\\n                  --agent=iphone\n```\n\n## Performance\nI tested this library with a compressed Apache log of 126MB (about 1.6GB uncompressed), by applying different filters and a combination of them all.  \nI measured execution time by using standard *time* function; memory consumption was recorded via Xcode's Instruments.  \n\n### Platform\nThe following benchmarks was measured on a MacBook PRO 15 late 2015, 4CPUs, 16GB RAM.\n\n### Results\n\n|  Applied filter/s      | Total results      | Execution time     |   RAM peak (MB) |\n| :--------------------- | -----------------: | -----------------: |---------------: |\n| no filters             |           3917386  |          2m9.971s  |         601.99  |\n| from/to                |           1803951  |         2m14.297s  |         331.42  |\n| code                   |           3183506  |         2m14.902s  |         712.77  |\n| ips                    |            121250  |          2m6.322s  |          28.86  |\n| request                |           2963684  |         2m22.195s  |         650.17  |\n| agent                  |           1695367  |         2m39.292s  |         612.25  |\n| combined               |              8977  |         2m47.704s  |           7.27  |\n\n### Considerations\nExecution time is CPU-bound and remains pretty consistent no matter the used filters.  \nRAM consumption strongly depends on the number of fetched data and on the kind of fetched data, user agent and request being the larger.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcostajob%2Fapache_log_parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcostajob%2Fapache_log_parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcostajob%2Fapache_log_parser/lists"}