{"id":13617443,"url":"https://github.com/nielsbasjes/logparser","last_synced_at":"2025-05-15T07:08:01.578Z","repository":{"id":10476229,"uuid":"12652963","full_name":"nielsbasjes/logparser","owner":"nielsbasjes","description":"Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Flink, Beam, Storm, Drill, ...","archived":false,"fork":false,"pushed_at":"2025-05-06T13:31:48.000Z","size":2922,"stargazers_count":159,"open_issues_count":2,"forks_count":41,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-05-12T01:08:42.894Z","etag":null,"topics":["apache","beam","drill","flink","hadoop","hive","httpd","java","logformat","nginx","parse","parser"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"serenity-dojo/vet-clinic","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nielsbasjes.png","metadata":{"files":{"readme":"README-Hive.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"nielsbasjes","custom":"https://www.paypal.me/nielsbasjes"}},"created_at":"2013-09-06T19:48:43.000Z","updated_at":"2025-05-06T13:30:34.000Z","dependencies_parsed_at":"2023-10-22T18:28:21.107Z","dependency_job_id":"878c6aa0-e76a-40a7-b24f-ca47030b0ed0","html_url":"https://github.com/nielsbasjes/logparser","commit_stats":{"total_commits":1073,"total_committers":4,"mean_commits":268.25,"dds":"0.34669151910531226","last_synced_commit":"a0e9e6d2f3ac233b6cba0a033ca052bea6c57a1e"},"previous_names":[],"tags_count":37,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nielsbasjes%2Flogparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nielsbasjes%2Flogparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nielsbasjes%2Flogparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nielsbasjes%2Flogparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nielsbasjes","download_url":"https://codeload.github.com/nielsbasjes/logparser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254292043,"owners_count":22046426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","beam","drill","flink","hadoop","hive","httpd","java","logformat","nginx","parse","parser"],"created_at":"2024-08-01T20:01:41.782Z","updated_at":"2025-05-15T07:07:56.561Z","avatar_url":"https://github.com/nielsbasjes.png","language":"Java","readme":"Hive\n====\n\nThe SerDe (it's really only a Deserializer) can be used present an Apache HTTPD logfile as a table in Hive.\n\nThis is an annotated example on how you could make the logfiles directly accessible through Hive.\n\nFirst we must ensure that Hive has the right jar file available. This can be either using the ADD JAR option in the Hive Cli\n or by installing it on the cluster.\n\n    ADD JAR target/httpdlog-serde-*-udf.jar;\n\nWe can now define an external table with column types are STRING, BIGINT and DOUBLE.\n\n    CREATE EXTERNAL TABLE nbasjes.clicks (\n         ip           STRING\n        ,timestamp    BIGINT\n        ,useragent    STRING\n        ,referrer     STRING\n        ,bui          STRING\n        ,screenHeight BIGINT\n        ,screenWidth  BIGINT\n    )\n\nOf course we must specify the class name of the Deserializer that does the heavy lifting.\n\n    ROW FORMAT SERDE 'nl.basjes.parse.apachehttpdlog.ApacheHttpdlogDeserializer'\n\nThe big part of the config lies in the SERDEPROPERTIES.\n\nThere are currently 4 types of options you can/must put in there:\n\n- \"logformat\" = \"[Apache httpd logformat]\"\n- \"field:[columnname]\" = \"[Field]\"\n- \"map:[field]\" = \"[new type]\"\n- \"load:[classname that implements Dissector]\" = \"[initialization string send to the initializeFromSettingsParameter method]\"\n\nNote that the order of various settings in the SERDEPROPERTIES is irrelevant.\n\n    WITH SERDEPROPERTIES (\n\n**\"logformat\" = \"[Apache httpd logformat]\"**\n\nThis is the Logformat specification straight from the apache httpd config file.\n\n        \"logformat\"       = \"%h %l %u %t \\\"%r\\\" %\u003es %b \\\"%{Referer}i\\\" \\\"%{User-Agent}i\\\" \\\"%{Cookie}i\\\" %T %V\"\n\n**\"field:[columnname]\" = \"[Field]\"**\n\nFor each column this type of property is needed for the system to know where to get the content from.\n\n        ,\"field:timestamp\" = \"TIME.EPOCH:request.receive.time.epoch\"\n        ,\"field:ip\"        = \"IP:connection.client.host\"\n        ,\"field:useragent\" = \"HTTP.USERAGENT:request.user-agent\"\n\n**\"map:[field]\" = \"[new type]\"**\n\nOnly used when mapping a specific field to a different type.\n\n        ,\"map:request.firstline.uri.query.g\"=\"HTTP.URI\"\n        ,\"map:request.firstline.uri.query.r\"=\"HTTP.URI\"\n\n        ,\"field:referrer\"  = \"STRING:request.firstline.uri.query.g.query.referrer\"\n        ,\"field:bui\"       = \"HTTP.COOKIE:request.cookies.bui\"\n\n**\"load:[classname that implements Dissector]\" = \"[initialization string send to the initializeFromSettingsParameter method]\"**\n\nOnly used when there is a custom Dissector implementation that needs to be loaded in addition to the regular Dissectors.\n\n        ,\"load:nl.basjes.parse.httpdlog.dissectors.ScreenResolutionDissector\" = \"x\"\n        ,\"map:request.firstline.uri.query.s\" = \"SCREENRESOLUTION\"\n        ,\"field:screenHeight\" = \"SCREENHEIGHT:request.firstline.uri.query.s.height\"\n        ,\"field:screenWidth\"  = \"SCREENWIDTH:request.firstline.uri.query.s.width\"\n    )\n\nFinally we define that this is stored as a TEXTFILE and where the files are located.\n\n    STORED AS TEXTFILE\n    LOCATION \"/user/nbasjes/clicks\";\n\n\nComplete example\n====\n\n    ADD JAR target/httpdlog-serde-*-udf.jar;\n\n    CREATE EXTERNAL TABLE nbasjes.clicks (\n         ip           STRING\n        ,timestamp    BIGINT\n        ,useragent    STRING\n        ,referrer     STRING\n        ,bui          STRING\n        ,screenHeight BIGINT\n        ,screenWidth  BIGINT\n    )\n\n    ROW FORMAT SERDE 'nl.basjes.parse.apachehttpdlog.ApacheHttpdlogDeserializer'\n\n    WITH SERDEPROPERTIES (\n\n        \"logformat\"       = \"%h %l %u %t \\\"%r\\\" %\u003es %b \\\"%{Referer}i\\\" \\\"%{User-Agent}i\\\" \\\"%{Cookie}i\\\" %T %V\"\n\n        ,\"field:timestamp\" = \"TIME.EPOCH:request.receive.time.epoch\"\n        ,\"field:ip\"        = \"IP:connection.client.host\"\n        ,\"field:useragent\" = \"HTTP.USERAGENT:request.user-agent\"\n\n        ,\"map:request.firstline.uri.query.g\"=\"HTTP.URI\"\n        ,\"map:request.firstline.uri.query.r\"=\"HTTP.URI\"\n\n        ,\"field:referrer\"  = \"STRING:request.firstline.uri.query.g.query.referrer\"\n        ,\"field:bui\"       = \"HTTP.COOKIE:request.cookies.bui\"\n\n        ,\"load:nl.basjes.parse.httpdlog.dissectors.ScreenResolutionDissector\" = \"x\"\n        ,\"map:request.firstline.uri.query.s\" = \"SCREENRESOLUTION\"\n        ,\"field:screenHeight\" = \"SCREENHEIGHT:request.firstline.uri.query.s.height\"\n        ,\"field:screenWidth\"  = \"SCREENWIDTH:request.firstline.uri.query.s.width\"\n    )\n    STORED AS TEXTFILE\n    LOCATION \"/user/nbasjes/clicks\";\n\nLicense\n===\n    Apache HTTPD \u0026 NGINX Access log parsing made easy\n    Copyright (C) 2011-2023 Niels Basjes\n\n    Licensed under the Apache License, Version 2.0 (the \"License\");\n    you may not use this file except in compliance with the License.\n    You may obtain a copy of the License at\n\n    https://www.apache.org/licenses/LICENSE-2.0\n\n    Unless required by applicable law or agreed to in writing, software\n    distributed under the License is distributed on an \"AS IS\" BASIS,\n    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n    See the License for the specific language governing permissions and\n    limitations under the License.\n","funding_links":["https://github.com/sponsors/nielsbasjes","https://www.paypal.me/nielsbasjes"],"categories":["Java","日志库"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnielsbasjes%2Flogparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnielsbasjes%2Flogparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnielsbasjes%2Flogparser/lists"}