{"id":21532059,"url":"https://github.com/digitalegesellschaft/digiges-access-logs","last_synced_at":"2025-03-17T19:28:30.100Z","repository":{"id":196931505,"uuid":"696462585","full_name":"DigitaleGesellschaft/digiges-access-logs","owner":"DigitaleGesellschaft","description":"Open source tools to analyze apache2 access logs","archived":false,"fork":false,"pushed_at":"2024-08-20T23:31:43.000Z","size":35,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-01-24T06:29:50.212Z","etag":null,"topics":["access-logs","apache2","awstats","goaccess","matomo"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DigitaleGesellschaft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-25T19:43:12.000Z","updated_at":"2024-08-20T23:31:47.000Z","dependencies_parsed_at":"2025-01-24T06:40:29.150Z","dependency_job_id":null,"html_url":"https://github.com/DigitaleGesellschaft/digiges-access-logs","commit_stats":null,"previous_names":["digitalegesellschaft/digiges-access-logs"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitaleGesellschaft%2Fdigiges-access-logs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitaleGesellschaft%2Fdigiges-access-logs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitaleGesellschaft%2Fdigiges-access-logs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitaleGesellschaft%2Fdigiges-access-logs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DigitaleGesellschaft","download_url":"https://codeload.github.com/DigitaleGesellschaft/digiges-access-logs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244096120,"owners_count":20397343,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["access-logs","apache2","awstats","goaccess","matomo"],"created_at":"2024-11-24T02:18:37.798Z","updated_at":"2025-03-17T19:28:30.075Z","avatar_url":"https://github.com/DigitaleGesellschaft.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\nrunme:\n  id: 01HKTA1PWQ3C4AJDY3N8ABKZ6V\n  version: v3\n---\n\nInstructions on how to analyze apache2 access logs with different open source tools.\n\n# Tools\n\n* docker (awstats)\n* goaccess (\u003e1.8)\n* matomo\n* duckdb\n\n# Commands\n\nNote: All ssh, rsync, scp commands use a configured ssh Host named 'digiges'.\n\nCreate a ./logs dir `mkdir logs` before you start.\n\n## Download Apache access logs of yesterday\n\nThe date in the archive name relates to the create date of the archive, which is always one day after the dates of the contained logs.\n`scp digiges:access.log/www.digitale-gesellschaft.ch-$( date \"+%Y%m%d\" ).tar.gz ./logs`\n\n### Download archives of last n days\n\nFor example the last 8 days, starting today (containing the logs of yesterday ;)).\n\n```bash {\"id\":\"01HKTBECEEP0TVNNEXGHAHD20R\"}\nbash ./fetch-logs.sh \"20231201\" \"20231231\" # or \"now -8days\"\n```\n\n## Import logs to awstats\n\nExisting logs are not overwritten\n\n`docker run --rm -v $(pwd)/logs:/web-logs:ro -eLOG_FORMAT=1 -v awstats-db:/var/lib/awstats openmicroscopy/awstats /web-logs/www.digit\\*.gz`\n\n## Run awstats web server\n\n`docker run --rm -p 8081:8080 -v awstats-db:/var/lib/awstats openmicroscopy/awstats httpd`\n\n## Create goaccess report\n\n`zcat ./logs/www.digitale-gesellschaft.ch-*.tar.gz | goaccess -p ./goaccess.conf -o report.html -`\n\n## Filter apache acces logs by URL path\n\nIn this example all GET request to digitalerechte/ or digitalerechte are extracted.\n\n`zcat ./logs/www.digitale-gesellschaft.ch-*.tar.gz | grep --text GET | grep --text -E 'digitalerechte( |/ )H' \u003e ./logs/digitalerechte.log`\n\nThe next example extracts all requests originating from a social media campaign that used query parameter markers.\n\n`zcat ./logs/www.digitale-gesellschaft.ch-*.tar.gz | grep --text GET | grep --text -E 'digitalerechte/?\\?s=(t|i|m|x|l) H' \u003e ./logs/digitalerechte-source_query.log`\n\nThe output file can be used to create a goaccess report, that only contains non-crawler visitors.\n\n`goaccess ./logs/digitalerechte-source_query.log -p ./goaccess.conf`\n\n## Remove crawlers / spiders / bots from logs\n\ngoaccess and matomo both have built in support to remove crawlers. However, a more flexible (and hopefully more complete) way to remove crawlers offers https://github.com/omrilotan/isbot:\n\n`zcat ./logs/www.digitale-gesellschaft.ch-*.tar.gz | deno run --reload exclude-bots.ts`\n\nRemove the reload flag to not always download the latest list of crawlers before log processing.\n\n## Remove irrelevant URIs from logs\n\nStatic resources such as js files and theme images are part of the wordpress theme or other plugins. These are sometimes not correctly identified as static resources, but as pages. Simply excluding those URIs might help, depending on the report use case. 'preview_id' is the query parameter used by wordpress when previewing a post.\n\n`zcat ./logs/www.digitale-gesellschaft.ch-*.tar.gz | grep --text --invert-match -E '\\.(txt|js|php|css|png|gif|jpeg|jpg|webp|svg|env|asp|woff|woff2)' | grep --text --invert-match -E 'preview_id'`\n\n# Presets\n\n## Page\n\nGenerate a goaccess report of a particular page, identified via slug.\n\nSyntax: `report-page_slug.sh :report_name :page_slug [:min_date] [:max_date]`\n\npage_slug value must contain the end of the page URI without the trailing slash (/). Use '' (empty slug name) to create a report covering all paths.\n\nmin_date and max_date is of format YYYYmmdd (e.g. '20230125') or 'now -7days'.\n\nBeware that logs must have been already fetched before.\n\n```bash {\"id\":\"01HKTBECEEP0TVNNEXGMK72A14\"}\nbash ./report-page_slug.sh \"könnsch\" \"koennsch-fuer-digitale-grundrechte\" \"20231214\" \"20231231\"\n\nbash ./report-page_slug.sh \"geheimjustiz\" \"geheimjustiz-am-bundesverwaltungsgericht-kabelaufklaerung-durch-geheimdienst\" \"20240107\" \"20240207\"\n```\n\n# Hits per Weekday\nUse duckdb to plot a histogram, which shows the number of total hits and hits by social media platform per day (including weekday).\n\nReplace log file name in following statement before run:\n\n```bash\nduckdb -box -s \"DROP TABLE IF EXISTS acclogs; CREATE TABLE acclogs AS SELECT * FROM read_csv_auto('logs/grundrechte-wahren-nostatic-normalized-nobot.log', delim=' ', header=false, names = ['clientIp', 'userId', 'nA', 'datetime', 'tzOffset', 'methodAndPath', 'responseStatus', 'bytes', 'referrer', 'userAgent'], types={'datetime':'DATE'}, dateformat='[%d/%b/%Y:%H:%M:%S'); SELECT datetrunc('day', datetime) || '-' || dayofweek(datetime) AS 'week-dayofweek', count(*) FILTER (WHERE methodAndPath ILIKE '%s=x H%') AS hitsX, count(*) FILTER (WHERE methodAndPath ILIKE '%s=i H%') AS hitsInstagram, count(*) FILTER (WHERE methodAndPath ILIKE '%s=m H%') AS hitsMastodon, count(*) FILTER (WHERE methodAndPath ILIKE '%s=l H%') AS hitsLinkedIn, count(*) AS hitsTotal FROM acclogs GROUP BY 1 ORDER BY 1 ASC;\"\n```\n\nJust total hits per day, but in a nice bar chart in the terminal (requires YouPlot):\n\n```bash\nduckdb -s \"DROP TABLE IF EXISTS acclogs; CREATE TABLE acclogs AS SELECT * FROM read_csv_auto('logs/grundrechte-wahren-nostatic-normalized-nobot.log', delim=' ', header=false, names = ['clientIp', 'userId', 'nA', 'datetime', 'tzOffset', 'methodAndPath', 'responseStatus', 'bytes', 'referrer', 'userAgent'], types={'datetime':'DATE'}, dateformat='[%d/%b/%Y:%H:%M:%S'); COPY (SELECT datetrunc('day', datetime) || '-' || dayofweek(datetime) AS 'week-dayofweek', count(*) AS hits FROM acclogs GROUP BY 1 ORDER BY 1 ASC) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)\" | uplot bar -d, -H\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalegesellschaft%2Fdigiges-access-logs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdigitalegesellschaft%2Fdigiges-access-logs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalegesellschaft%2Fdigiges-access-logs/lists"}