{"id":24801321,"url":"https://github.com/willthefarmer/apache-logs-to-mysql","last_synced_at":"2025-04-11T06:02:27.021Z","repository":{"id":260065945,"uuid":"875067037","full_name":"WillTheFarmer/apache-logs-to-mysql","owner":"WillTheFarmer","description":"Quickly ingest folders of Apache log files into a normalized database. Automate processing files from multiple domains and multiple servers with complete Data Lineage. Small codebase \u0026 simple setup.","archived":false,"fork":false,"pushed_at":"2025-04-02T20:45:32.000Z","size":25600,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T21:34:53.567Z","etag":null,"topics":["apache-log","apache-logging","apache-logs","apache2","log-management","log-parser","mariadb","mariadb-database","mariadb-mysql","mysql","mysql-database","mysql-schema","python3","sql"],"latest_commit_sha":null,"homepage":"https://willthefarmer.github.io/","language":"SQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WillTheFarmer.png","metadata":{"files":{"readme":".github/README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":".github/SECURITY.md","support":".github/SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"buy_me_a_coffee":"WillTheFarmer"}},"created_at":"2024-10-19T02:49:19.000Z","updated_at":"2025-04-02T20:45:36.000Z","dependencies_parsed_at":"2024-11-19T04:15:36.014Z","dependency_job_id":"91b9b13f-19ad-440e-ba67-080b8f7b8499","html_url":"https://github.com/WillTheFarmer/apache-logs-to-mysql","commit_stats":null,"previous_names":["willthefarmer/apachelogs2mysql","willthefarmer/apache-logs-to-mysql"],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WillTheFarmer%2Fapache-logs-to-mysql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WillTheFarmer%2Fapache-logs-to-mysql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WillTheFarmer%2Fapache-logs-to-mysql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WillTheFarmer%2Fapache-logs-to-mysql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WillTheFarmer","download_url":"https://codeload.github.com/WillTheFarmer/apache-logs-to-mysql/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248351383,"owners_count":21089271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-log","apache-logging","apache-logs","apache2","log-management","log-parser","mariadb","mariadb-database","mariadb-mysql","mysql","mysql-database","mysql-schema","python3","sql"],"created_at":"2025-01-30T04:28:17.675Z","updated_at":"2025-04-11T06:02:26.979Z","avatar_url":"https://github.com/WillTheFarmer.png","language":"SQL","funding_links":["https://buymeacoffee.com/WillTheFarmer"],"categories":[],"sub_categories":[],"readme":"# Database designed for Apache log data analysis \n![Entity Relationship Diagram](./assets/entity_relationship_diagram.png)\n## Python handles File Processing \u0026 Database handles Data Processing\nApacheLogs2MySQL consists of two Python Modules \u0026 one Database Schema ***apache_logs*** to automate importing Access \u0026 Error files, normalizing log data into database and generating a well-documented data lineage audit trail.\n\nImports Access Logs in LogFormats - ***common***, ***combined*** and ***vhost_combined*** \u0026 additional ***csv2mysql*** LogFormat defined below.\n\nImports Error Logs in ***default*** ErrorLogFormat \u0026 ***additional*** ErrorLogFormat defined below performing data harmonization \non Apache Codes \u0026 Messages, System Codes \u0026 Messages, and Log Messages to create a unified, standardized dataset.\n\nAll processing stages (child processes) are encapsulated within one \"Import Load\" (parent process) that captures process metrics, notifications and errors into Database import tables. \nEvery log data record is traceable back to the computer, path, file, load process, parse process and import process the data originates from.\n\nMultiple Access and Error logs and formats can be loaded, parsed and imported along with User Agent parsing and IP Address Geolocation retrieval processes within a single \"Import Load\" execution. \n\nA single \"Import Load\" execution can also be configured to only load logs to Server (single child process) leaving other processes to be executed within another \"Import Load\" on a centralized computer.\n### Process Messages in Console - 4 LogFormats, 2 ErrorLogFormats \u0026 6 Stored Procedures can be processed in a single Import Load execution\n![Processing Messages Console](./assets/processing_messages_console.png)\n### Application runs on Windows, Linux \u0026 MacOS - Database runs on MySQL \u0026 MariaDB\nThis is a fast, reliable processing application with detailed logging and two stages of data parsing. \nFirst stage is performed in `LOAD DATA LOCAL INFILE` statements. \nSecond stage is performed in `process_access_parse` and `process_error_parse` Stored Procedures.\n\nPython handles polling of log file folders and executing Database LOAD DATA, Stored Procedures, Stored Functions and SQL Statements. Python drives the application but MySQL or MariaDB does all Data Manipulation \u0026 Processing.\n\nApplication determines what files have been processed using `apache_logs.import_file` TABLE. \nEach imported file has record with name, path, size, created, modified attributes inserted during `processLogs`.\n\nApplication runs with no need for user interaction. File deletion is not required by application if files desired for later reference.\n\nOn servers, run application in conjunction with [logrotate](https://github.com/logrotate/logrotate) using [configuration file directives](https://man7.org/linux/man-pages/man8/logrotate.8.html) - `dateext`, `rotate`, `olddir`, `nocompress`, `notifempty`, `maxage`.\nSet `WATCH_PATH` to same folder as `olddir` and configure logrotate to delete files.\n\nOn centralized computers, environment variables - `BACKUP_DAYS` and `BACKUP_PATH` can be configured to remove files from `WATCH_PATH` to reduce `apache_logs.importFileExists` execution in `processLogs` when tens of thousands of files exist in `WATCH_PATH` subfolder structure. If `BACKUP_DAYS` is set to 0 files are never moved or deleted from `WATCH_PATH` subfolder structure. Setting `BACKUP_DAYS` to a positive number will copy files to `BACKUP_PATH` creating an identical subfolder structure as `WATCH_PATH` as files are copied. `BACKUP_DAYS` is number of days since file was initially added to `apache_logs.import_file` TABLE before file is moved to `BACKUP_PATH`. Once file is copied the file will be deleted from `WATCH_PATH`. Setting `BACKUP_DAYS` = -1 files are not copied to `BACKUP_PATH` before deleting files from `WATCH_PATH`. When `BACKUP_DAYS` is set to -1 files are deleted from `WATCH_PATH` next time `processLogs` is executed.\n\nLog-level variables can be set to display Process Messages in console or inserted into [PM2](https://github.com/Unitech/pm2) logs for every process step. \nAll import errors in Python `processLogs` (client) and Stored Procedures (server) are inserted into `apache_logs.import_error` TABLE.\nThis is the only schema table that uses ENGINE=MYISAM to avoid TRANSACTION ROLLBACKS.\n\nLogging functionality, database design and table relationship constraints produce both physical and logical integrity. \nThis enables a complete audit trail providing ability to determine who, what, when and where each log record originated from.\n\nAll folder paths, filename patterns, logging, processing, Database connection setting variables are in .env file for easy installation and maintenance.\n\nClient `watch4logs` module can run in [PM2](https://github.com/Unitech/pm2) daemon process manager or `logs2mysql` module run in [logrotate's](https://github.com/logrotate/logrotate) apache `postrotate` configuration for 24/7 online processing on multiple web servers feeding a single Server module simultaneous.\n### Valuable Data Enrichment \u0026 Visual Enhancements\n***IP Geolocation data*** integration using [MaxMind GeoIP2](https://pypi.org/project/geoip2/) Python API provides IP country, subdivision, city, system organization, \nnetwork and coordinates information stored and normalized into 6 Database Schema tables.\n\nApplication requires two GeoLite Databases - ***City*** \u0026 ***ASN***. GeoLite databases are subsets of the commercial databases with reduced coverage and accuracy. Application tested with these databases: \n1) GeoLite2 databases at [MaxMind](https://www.maxmind.com/en/geolite-free-ip-geolocation-data) available under MaxMind continues to incorporate Creative Commons into our GeoLite End User Agreement (EULA).\n\n2) DB-IP Lite databases at [DB-IP](https://db-ip.com/db/lite.php) available under Creative Commons Attribution 4.0 International License.\n\n***User-Agent data*** integration using [user-agents](https://pypi.org/project/user-agents/) provides browser, device and operating system information stored and normalized into 11 Database Schema tables.\n\n[MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts) is a ***visualization tool*** for the Database Schema ***apache_logs*** currently under development. The Web interface consists of [Express](https://github.com/expressjs/express) web application frameworks with Drill Down Capability \n\u0026 [Apache ECharts](https://github.com/apache/echarts) frameworks for Data Visualization.\n## Four Supported Access Log Formats\nApache uses same Standard Access LogFormats (***common***, ***combined***, ***vhost_combined***) on all 3 platforms. Each LogFormat adds 2 Format Strings to the prior. \nFormat String descriptions are listed below each LogFormat. Information from: https://httpd.apache.org/docs/2.4/mod/mod_log_config.html#logformat \n```\nLogFormat \"%h %l %u %t \\\"%r\\\" %\u003es %O\" common\n```\n|Format String|Description|\n|-------------|-----------|\n|%h|Remote hostname. Will log IP address if HostnameLookups is set to Off, which is default. If it logs hostname for only a few hosts, you probably have access control directives mentioning them by name.|\n|%l|Remote logname. Returns dash unless \"mod_ident\" is present and IdentityCheck is set On. This can cause serious latency problems accessing server since every request requires a lookup be performed.| \n|%u|Remote user if the request was authenticated. May be bogus if return status (%s) is 401 (unauthorized).|\n|%t|Time the request was received, in the format [18/Sep/2011:19:18:28 -0400]. The last number indicates the timezone offset from GMT|\n|%r|First line of request. Contains 4 format strings (%m - The request method, %U - The URL path requested not including any query string, %q - The query string, %H - The request protocol)|\n|%s|Status. For requests that have been internally redirected, this is the status of the original request. Use %\u003es for the final status.|\n|%O|Bytes sent, including headers. May be zero in rare cases such as when a request is aborted before a response is sent. You need to enable mod_logio to use this.|\n```\nLogFormat \"%h %l %u %t \\\"%r\\\" %\u003es %O \\\"%{Referer}i\\\" \\\"%{User-Agent}i\\\"\" combined\n```\n|Format String|Description - additional format strings|\n|-------------|-----------|\n|\"%{Referer}i|The \"Referer\" (sic) HTTP request header. This gives the site that the client reports having been referred from.|\n|%{User-Agent}i|The User-Agent HTTP request header. This is the identifying information that the client browser reports about itself.|\n```\nLogFormat \"%v:%p %h %l %u %t \\\"%r\\\" %\u003es %O \\\"%{Referer}i\\\" \\\"%{User-Agent}i\\\"\" vhost_combined\n```\n|Format String|Description - additional format strings|\n|-------------|-----------|\n|%v|The canonical ServerName of the server serving the request.|\n|%p|The canonical port of the server serving the request.|\n\nApplication is designed to use the ***csv2mysql*** LogFormat. LogFormat has comma-separated values and adds 8 Format Strings. A complete list of Format Strings\nwith descriptions indicating added Format Strings below.\n```\nLogFormat \"%v,%p,%h,%l,%u,%t,%I,%O,%S,%B,%{ms}T,%D,%^FB,%\u003es,\\\"%H\\\",\\\"%m\\\",\\\"%U\\\",\\\"%q\\\",\\\"%{Referer}i\\\",\\\"%{User-Agent}i\\\",\\\"%{VARNAME}C\\\",%L\" csv2mysql\n```\n|Format String|Description|\n|-------------|-----------|\n|%v|The canonical ServerName of the server serving the request.|\n|%p|The canonical port of the server serving the request.|\n|%h|Remote hostname. Will log the IP address if HostnameLookups is set to Off, which is the default.|\n|%l|Remote logname. Returns dash unless \"mod_ident\" is present and IdentityCheck is set On. This can cause serious latency problems accessing server since every request requires a lookup be performed.| \n|%u|Remote user if the request was authenticated. May be bogus if return status (%s) is 401 (unauthorized).|\n|%t|Time the request was received, in the format [18/Sep/2011:19:18:28 -0400]. The last number indicates the timezone offset from GMT|\n|%I|ADDED - Bytes received, including request and headers. Enable \"mod_logio\" to use this.|\n|%O|Bytes sent, including headers. The %O format provided by mod_logio will log the actual number of bytes sent over the network. Enable \"mod_logio\" to use this.|\n|%S|ADDED - Bytes transferred (received and sent), including request and headers, cannot be zero. This is the combination of %I and %O. Enable \"mod_logio\" to use this.|\n|%B|ADDED - Size of response in bytes, excluding HTTP headers. Does not represent number of bytes sent to client, but size in bytes of HTTP response (will differ, if connection is aborted, or if SSL is used).|\n|%{ms}T|ADDED - The time taken to serve the request, in milliseconds. Combining %T with a unit is available in 2.4.13 and later.|\n|%D|ADDED - The time taken to serve the request, in microseconds.|\n|%^FB|ADDED - Delay in microseconds between when the request arrived and the first byte of the response headers are written. Only available if LogIOTrackTTFB is set to ON. Available in Apache 2.4.13 and later.|\n|%s|Status. For requests that have been internally redirected, this is the status of the original request.|\n|%H|The request protocol. Included in %r - First line of request.|\n|%m|The request method. Included in %r - First line of request.|\n|%U|The URL path requested, not including any query string. Included in %r - First line of request.|\n|%q|The query string (prepended with a ? if a query string exists, otherwise an empty string). Included in %r - First line of request.|\n|%{Referer}i|The \"Referer\" (sic) HTTP request header. This gives the site that the client reports having been referred from.|\n|%{User-Agent}i|The User-Agent HTTP request header. This is the identifying information that the client browser reports about itself.|\n|%{VARNAME}C|ADDED - The contents of cookie VARNAME in request sent to server. Only version 0 cookies are fully supported. Format String is optional.|\n|%L|ADDED - The request log ID from the error log (or '-' if nothing has been logged to the error log for this request). Look for the matching error log line to see what request| caused what error.\n## Two supported Error Log Formats\nApplication processes Error Logs with ***default format*** for threaded MPMs (Multi-Processing Modules). If running Apache 2.4 on any platform \nand ErrorLogFormat is not defined in config files this is the Error Log format.\nInformation from: https://httpd.apache.org/docs/2.4/mod/core.html#errorlogformat\n```\nErrorLogFormat \"[%{u}t] [%-m:%l] [pid %P:tid %T] %7F: %E: [client\\ %a] %M% ,\\ referer\\ %{Referer}i\"\n```\n|Format String|Description|\n|-------------|-----------|\n|%{u}t|The current time including micro-seconds|\n|%m|Name of the module logging the message|\n|%l|Loglevel of the message|\n|%P|Process ID of current process|\n|%T|Thread ID of current thread|\n|%F|Source file name and line number of the log call. %7F - the 7 means only display when LogLevel=debug|\n|%E|APR/OS error status code and string|\n|%a|Client IP address and port of the request|\n|%M|The actual log message|\n|%{Referer}i|The \"Referer\" (sic) HTTP request header. This gives the site that the client reports having been referred from.| \n\nApplication also processes Error Logs with ***additional format*** which adds:\n 1) `%v - The canonical ServerName` - This is easiest way to identify error logs for each domain is add `%v` to ErrorLogFormat. \n 2) `%L - Log ID of the request` - This is easiest way to associate Access record that created an Error record. \n Apache mod_unique_id.generate_log_id() only called when error occurs and will not cause performance degradation under error-free operations. \n\n***Important:*** `Space` required on left-side of `Commas` as defined below:\n```\nErrorLogFormat \"[%{u}t] [%-m:%l] [pid %P:tid %T] %7F: %E: [client\\ %a] %M% ,\\ referer\\ %{Referer}i ,%v ,%L\"\n```\nTo use this format place `ErrorLogFormat` before `ErrorLog` in `apache2.conf` to set error log format for ***Server*** and ***VitualHosts*** on Server.\n|Format String|Description - `Space` required on left-side of `Commas` to parse data properly|\n|-------------|-----------|\n|%v|The canonical ServerName of the server serving the request.|\n|%L|Log ID of the request. A %L format string is also available in `mod_log_config` to allow to correlate access log entries with error log lines. If [mod_unique_id](https://httpd.apache.org/docs/current/mod/mod_unique_id.html) is loaded, its unique id will be used as log ID for requests.|\n\n### Three options to associate ServerName \u0026 ServerPort to Access \u0026 Error logs\nApache LogFormats - ***common***, ***combined*** and Apache ErrorLogFormat - ***default*** do not contain `%v - canonical ServerName` and `%p - canonical ServerPort`.\n\nIn order to consolidate logs from multiple domains `%v - canonical ServerName` is required and `%p - canonical ServerPort` is optional.\n\nOptions to associate ServerName and ServerPort to Access and Error logs are:\n\n1) Image shows three configurations. Top (A) is default and Bottom (C) will SET  `server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined` TABLES during Python `LOAD DATA LOCAL INFILE` execution.\n\n![load_settings_variables.png](./assets/load_settings_variables.png)\n\n2) Manually ***UPDATE*** `server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined` TABLES after STORED PROCEDURES `process_access_parse` \nand `process_error_parse` and before `process_access_import` and `process_error_import`. \nIf `%v` or `%p` Format Strings exist parsing into `server_name` and `server_port` COLUMNS is performed in parse processes. \nData Normalization is performed in import processes. \n\n3) Populate `server_name` and `server_port` COLUMNS in `import_file` TABLE before import processes. This will populate all records associated with file.\nThis option only updates records with NULL values in ***load_tables*** `server_name` and `server_port` COLUMNS while executing \nSTORED PROCEDURES `process_access_import` and `process_error_import`. \n\nUPDATE commands to populate both Access and Error Logs if ***\"Log File Names\"*** are related to VirtualHost similar to:\n```\n ErrorLog ${APACHE_LOG_DIR}/farmfreshsoftware.error.log\n CustomLog ${APACHE_LOG_DIR}/farmfreshsoftware.access.log csv2mysql\n```\nLog file naming conventions enable the use of UPDATE statements:\n```\nUPDATE apache_logs.import_file SET server_name='farmfreshsoftware.com', server_port=443 WHERE server_name IS NULL AND name LIKE '%farmfreshsoftware%';\nUPDATE apache_logs.import_file SET server_name='farmwork.app', server_port=443 WHERE server_name IS NULL AND name LIKE '%farmwork%';\nUPDATE apache_logs.import_file SET server_name='ip255-255-255-255.us-east.com', server_port=443 WHERE server_name IS NULL AND name LIKE '%error%';\n```\n## Required Python Packages\nSingle quotes around 'PyMySQL[rsa]' package required on macOS.\n|Python Package|Installation Command|GitHub Repository|\n|--------------|---------------|------------|\n|[PyMySQL](https://pypi.org/project/PyMySQL/)|python -m pip install PyMySQL[rsa]|[PyMySQL/PyMySQL](https://github.com/PyMySQL/PyMySQL)|\n|[user-agents](https://pypi.org/project/user-agents/)|python -m pip install pyyaml ua-parser user-agents|[selwin/python-user-agents](https://github.com/selwin/python-user-agents)|\n|[watchdog](https://pypi.org/project/watchdog/)|python -m pip install watchdog|[gorakhargosh/watchdog](https://github.com/gorakhargosh/watchdog/tree/master)|\n|[python-dotenv](https://pypi.org/project/python-dotenv/)|python -m pip install python-dotenv|[theskumar/python-dotenv](https://github.com/theskumar/python-dotenv)|\n|[geoip2](https://pypi.org/project/geoip2/)|python -m pip install geoip2|[maxmind/GeoIP2-python](https://github.com/maxmind/GeoIP2-python)|\n\n## Installation Instructions\nSteps make installation quick and straightforward. Application will be ready to import Apache logs on completion.\n\n### 1. Python\nInstall all required packages (`requirements.txt` in repository):\n```\npip install -r requirements.txt\n```\n### 2. Database\nBefore running `apache_logs_schema.sql` if User Account `root`@`localhost` does not exist on installation server open \nfile and perform a ***Find and Replace*** using a User Account with DBA Role on installation server. Copy below:\n```\nroot`@`localhost`\n```\nRename above \u003csup\u003euser\u003c/sup\u003e to a \u003csup\u003euser\u003c/sup\u003e on your server. For example - `root`@`localhost` to `dbadmin`@`localhost`\n\nThe easiest way to install is use Database Command Line Client. Login as User with DBA Role and execute the following:\n```\nsource yourpath/apache_logs_schema.sql\n```\nOnly MySQL server must be configured in `my.ini`, `mysqld.cnf` or `my.cnf` depending on platform with following: \n```\n[mysqld]\nlocal-infile=1\n```\n### 3. Create Database USER \u0026 GRANTS\nTo minimize data exposure and breach risks create a Database USER for Python module with GRANTS to only schema objects and privileges required to execute import processes. Replace hostname from `localhost` to hostname of installed database if different. (`mysql_user_and_grants.sql` in repository)\n![mysql_user_and_grants.sql in repository](./assets/mysql_user_and_grants.png)\n### 4. Settings.env Variables\nSetting environment variables `ERROR`,`COMBINED`, `VHOST`, `CSV2MYSQL`, `USERAGENT` and `GEOIP` = 0 processes nothing but does insert a record into `import_load` TABLE indicating `processLogs` was executed.\n\n`COMBINED` processes ***common*** and ***combined*** LogFormats. `ERROR` processes ***default*** and ***additional*** ErrorLogFormats.\n\nMost configurations will only process a single LogFormat and ErrorLogFormat. Set required formats = 1. \n\nMake sure logFormats are in correct logFormat folders. Application does not detect logFormats and data will not import properly.\n\nUse backslash `\\` for Windows paths and forward slash `/` for Linux and MacOS paths. \n\nsettings.env with default settings for Ubuntu. (`settings.env` in repository)\n![settings.env in repository](./assets/settings.png)\n### 5. Rename settings.env file to .env\nBy default, load_dotenv() looks for standard setting file name `.env` and file is loaded in both `logs2mysql.py` and `watch4files.py` with following line:\n```\nload_dotenv() # Loads variables from .env into the environment\n```\n### 6. Run Application\nIf log files exist in folders run `logs2mysql.py` and all files in all folders will be processed. Run `watch4logs.py` and \ndrop a file or files into folder and `logs2mysql.py` will be executed. \nIf folders are empty or contain files when a file is drop into folder any unprocessed files in folders will be processed.\n\nRun import process directly:\n```\npython3 logs2mysql.py\n```\nRun polling module:\n```\npython3 watch4logs.py\n```\n## How Python Client module CALLS Stored Procedures\nPython Client module CALLS Stored Procedures passing the SECOND PARAMETER = `importloadid` which processes ONLY files \u0026 records imported by current `processLogs function` execution.\n\nListed below shows what Store Procedures are called based on environment settings. \n\n1. Set environment variables `ERROR_PROCESS`,`COMBINED_PROCESS`, `VHOST_PROCESS`, `CSV2MYSQL_PROCESS`, `USERAGENT_PROCESS` and `GEOIP_PROCESS` = 0:\n\n    no Stored Procedures are executed by Python Client module. Only LOAD DATA statements are executed inserting raw log data into LOAD TABLES.\n\n2. Set environment variables `ERROR_PROCESS`,`COMBINED_PROCESS`, `VHOST_PROCESS`, `CSV2MYSQL_PROCESS` = 1 and `USERAGENT_PROCESS` and `GEOIP_PROCESS` = 0: \n\n    Python Client module CALLS 2 Stored Procedures - `process_error_parse` and `process_access_parse`.  `process_access_parse` is CALLED 3 times with a different FIRST PARAMETER.\n\n3. Set environment variables `ERROR_PROCESS`,`COMBINED_PROCESS`, `VHOST_PROCESS`, `CSV2MYSQL_PROCESS` = 2 and `USERAGENT_PROCESS` and `GEOIP_PROCESS` = 0: \n\n    Python Client module CALLS 4 Stored Procedures  - `process_access_parse`, `process_access_import`, `process_error_parse`, `process_error_import`. `process_access_parse` and `process_access_import` are CALLED 3 times each with a different FIRST PARAMETER.\n\n4. Set environment variables `ERROR_PROCESS`,`COMBINED_PROCESS`, `VHOST_PROCESS`, `CSV2MYSQL_PROCESS` = 0 and `USERAGENT_PROCESS` and `GEOIP_PROCESS` = 1: \n\n    Python Client module executes SELECT on `access_log_useragent` TABLE for records not normalized. If records exist CALLS Stored Procedures `normalize_useragent`\n\n    Python Client module executes SELECT on `log_client` TABLE for records not normalized. If records exist CALLS Stored Procedures `normalize_client`\n\n5. Set environment variables `ERROR_PROCESS`,`COMBINED_PROCESS`, `VHOST_PROCESS`, `CSV2MYSQL_PROCESS` = 2 and `USERAGENT_PROCESS` and `GEOIP_PROCESS` = 1: \n\n    Python Client module CALLS all 6 Stored Procedures.\n\n## Execute Stored Procedures from Command Line\n#### COLUMN process_status in LOAD DATA tables - load_access_combined, load_access_csv2mysql, load_access_vhost, load_error_default\n1. process_status=0 - LOAD DATA tables loaded with raw log data\n2. process_status=1 - process_error_parse or process_access_parse executed on record\n3. process_status=2 - process_error_import or process_access_import executed on record\n\nThe `process_status` COLUMN of the LOAD DATA tables determine files \u0026 records processed stage. The files \u0026 records process stages can contain multiple `importloadid` values.\n\nExecute Stored Procedures with a SECOND PARAMETER = 'ALL' processes files \u0026 records based on `process_status` value. \n\nExecute Stored Procedures with second parameter `importloadid` value as a STRING processes ONLY files \u0026 records related to that `importloadid`.\n\nSecond parameter enables Python Client modules to run on multiple servers simultaneously uploading to a single Database Server `apache_logs` schema.\n\n`call_processes.sql` contains execution commands for each stored procedure. Comment area has functionality explanation. (`call_processes.sql` in repository)\n![call_processes.sql in repository](./assets/call_processes.png)\n## Verify ServerNames from Command Line\n`check_domain_columns.sql` contains SQL SELECT and UPDATE statements to check, validate and update Domain data.\nLog files imported from multiple domains require a ServerName value to properly filter and report data. (`check_domain_columns.sql` in repository)\n![check_domain_columns.sql in repository](./assets/check_domain_columns.png)\n## Database Normalization\nDatabase normalization is the process of organizing data in a relational database to improve data integrity and reduce redundancy. \nNormalization ensures that data is organized in a way that makes sense for the data model and attributes, and that the database functions efficiently.\n\nDatabase `apache_logs` schema currently has 55 Tables, 1040 Columns, 190 Indexes, 85 Views, 8 Stored Procedures and 90 Functions to process Apache Access log in 4 formats \n\u0026 Apache Error log in 2 formats. Database normalization at work!\n\nDatabase normalization is a critical process in database design with objectives of optimizing data storage, improving data integrity, and reducing data anomalies.\nOrganizing data into normalized tables greatly enhances efficiency and maintainability of a database system.\n## 85 Views in apache_logs schema\n![apache_logs_view_list.png](./assets/apache_logs_view_list.png)\n\nThe schema has many useful views of Import, Access and Error primary attribute tables created in normalization process with simple aggregate values. These are primitive data presentations of the data warehouse. \nMore complex data Slicing and Dicing is done in [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts).\n\nIf you find this code useful please contribute a :star: to the repository. It will also be encouragement to complete MySQL2ApacheECharts.\n#### Access Log View by Browser\nDatabase View - apache_logs.access_ua_browser_family_list - data from LogFormat: combined \u0026 csv2mysql\n![view-access_ua_browser_family_list.png](./assets/access_ua_browser_list.png)\n#### Access Log View by URI\nDatabase View - apache_logs.access_requri_list - data from LogFormat: combined \u0026 csv2mysql\n![view-access_requri_list](./assets/access_requri_list.png)\n#### Error Log Views\nError logs consist of three different data formats for error types. \nApplication harmonizes the 3 formats into a single standardized format and normalizes primary attributes.\nError log attribute is name of first column or first and second column.\nEach attribute has an associated table in ***apache_logs*** schema.\n![error_log_apache_message_list](./assets/error_log_apache_message_list.png)\n![error_log_system_message](./assets/error_log_system_message.png)\n![error_log_message_list](./assets/error_log_message_list.png)\n![error_processID_threadID_list](./assets/error_processID_threadID_list.png)\n![error_log_apache_code_list](./assets/error_log_apache_code_list.png)\n![error_log_client_list](./assets/error_log_client_list.png)\n![error_log_system_code_list](./assets/error_log_system_code_list.png)\n![error_log_module_list](./assets/error_log_module_list.png)\n![error_log_level_list](./assets/error_log_level_list.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillthefarmer%2Fapache-logs-to-mysql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwillthefarmer%2Fapache-logs-to-mysql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillthefarmer%2Fapache-logs-to-mysql/lists"}