{"id":15404522,"url":"https://github.com/k1low/harvest","last_synced_at":"2025-07-21T11:32:00.982Z","repository":{"id":43884259,"uuid":"166939227","full_name":"k1LoW/harvest","owner":"k1LoW","description":":beetle: Portable log aggregation tool for middle-scale system operation/troubleshooting.","archived":false,"fork":false,"pushed_at":"2024-05-07T13:12:22.000Z","size":17123,"stargazers_count":34,"open_issues_count":2,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-14T06:37:28.494Z","etag":null,"topics":["debugging","kubernetes","log-aggregation","log-tail","ssh","troubleshooting"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/k1LoW.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"k1LoW","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2019-01-22T06:23:55.000Z","updated_at":"2024-12-05T14:31:52.000Z","dependencies_parsed_at":"2024-03-16T22:59:00.596Z","dependency_job_id":"89ac5d2c-04c2-4dad-b709-f2c90fd25dc0","html_url":"https://github.com/k1LoW/harvest","commit_stats":null,"previous_names":[],"tags_count":36,"template":false,"template_full_name":null,"purl":"pkg:github/k1LoW/harvest","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k1LoW%2Fharvest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k1LoW%2Fharvest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k1LoW%2Fharvest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k1LoW%2Fharvest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/k1LoW","download_url":"https://codeload.github.com/k1LoW/harvest/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k1LoW%2Fharvest/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266291603,"owners_count":23906298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["debugging","kubernetes","log-aggregation","log-tail","ssh","troubleshooting"],"created_at":"2024-10-01T16:13:29.883Z","updated_at":"2025-07-21T11:32:00.963Z","avatar_url":"https://github.com/k1LoW.png","language":"Go","funding_links":["https://github.com/sponsors/k1LoW"],"categories":[],"sub_categories":[],"readme":"# Harvest [![Build Status](https://github.com/k1LoW/filt/workflows/build/badge.svg)](https://github.com/k1LoW/filt/actions) [![GitHub release](https://img.shields.io/github/release/k1LoW/harvest.svg)](https://github.com/k1LoW/harvest/releases) [![Go Report Card](https://goreportcard.com/badge/github.com/k1LoW/harvest)](https://goreportcard.com/report/github.com/k1LoW/harvest)\n\n\u003e Portable log aggregation tool for middle-scale system operation/troubleshooting.\n\n![screencast](doc/screencast.svg)\n\nHarvest provides the `hrv` command with the following features.\n\n- Agentless.\n- Portable.\n- Only 1 config file.\n- Fetch various remote/local log data via SSH/exec/Kubernetes API. ( `hrv fetch` )\n- Output all fetched logs in the order of timestamp. ( `hrv cat` )\n- Stream various remote/local logs via SSH/exec/Kubernetes API. ( `hrv stream` )\n- Copy remote/local raw logs via SSH/exec. ( `hrv cp` )\n\n## Quick Start ( for Kubernetes )\n\n``` console\n$ hrv generate-k8s-config \u003e cluster.yml\n$ hrv stream -c cluster.yml --tag='kube_apiserver or coredns' --with-path --with-timestamp\n```\n\n## Usage\n\n### :beetle: Fetch and output remote/local log data\n\n#### 1. Set log sources (and log type) in config.yml\n\n``` yaml\n---\ntargetSets:\n  -\n    description: webproxy syslog\n    type: syslog\n    sources:\n      - 'ssh://webproxy.example.com/var/log/syslog*'\n    tags:\n      - webproxy\n      - syslog\n  -\n    description: webproxy NGINX access log\n    type: combinedLog\n    sources:\n      - 'ssh://webproxy.example.com/var/log/nginx/access_log*'\n    tags:\n      - webproxy\n      - nginx\n  -\n    description: app log\n    type: regexp\n    regexp: 'time:([^\\t]+)'\n    timeFormat: 'Jan 02 15:04:05' # Golang time format and 'unixtime'\n    timeZone: '+0900'\n    sources:\n      - 'ssh://app-1.example.com/var/log/ltsv.log*'\n      - 'ssh://app-2.example.com/var/log/ltsv.log*'\n      - 'ssh://app-3.example.com/var/log/ltsv.log*'\n    tags:\n      - app\n  -\n    description: db dump log\n    type: regexp\n    regexp: '\"ts\":\"([^\"]+)\"'\n    timeFormat: '2006-01-02T15:04:05.999-0700'\n    sources:\n      - 'ssh://db.example.com/var/log/tcpdp/eth0/dump*'\n    tags:\n      - db\n      - query\n  -\n    description: PostgreSQL log\n    type: regexp\n    regexp: '^\\[?(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2} \\w{3})'\n    timeFormat: '2006-01-02 15:04:05 MST'\n    multiLine: true\n    sources:\n      - 'ssh://db.example.com/var/log/postgresql/postgresql*'\n    tags:\n      - db\n      - postgresql\n  -\n    description: local Apache access log\n    type: combinedLog\n    sources:\n      - 'file:///path/to/httpd/access.log'\n    tags:\n      - httpd\n-\n    description: api on Kubernetes\n    type: k8s\n    sources:\n      - 'k8s://context-name/namespace/pod-name*'\n    tags:\n      - api\n      - k8s\n```\n\nYou can use `hrv configtest` for config test.\n\n``` console\n$ hrv configtest -c config.yml\n```\n\n#### 2. Fetch target log data via SSH/exec/Kubernetes API ( `hrv fecth` )\n\n``` console\n$ hrv fetch -c config.yml --tag=webproxy,db\n```\n\n#### 3. Output log data ( `hrv cat` )\n\n``` console\n$ hrv cat harvest-20181215T2338+900.db --with-timestamp --with-host --with-path | less -R\n```\n\n#### 4. Count log data ( `hrv count` )\n\n``` console\n$ hrv count harvest-20191015T2338+900.db -g minute -g webproxy -b db\nts      webproxy db\n2019-09-24 08:01:00     9618    5910\n2019-09-24 08:02:00     9767    5672\n2019-09-24 08:03:00     10815   7394\n2019-09-24 08:04:00     11782   7109\n2019-09-24 08:05:00     9896    6346\n[...]\n2019-09-24 08:24:00     11619   5646\n2019-09-24 08:25:00     10541   6097\n2019-09-24 08:26:00     11336   5264\n2019-09-24 08:27:00     1102    5261\n2019-09-24 08:28:00     1318    6660\n2019-09-24 08:29:00     10362   5663\n2019-09-24 08:30:00     11136   5373\n2019-09-24 08:31:00     1748    1340\n```\n\n### :beetle: Stream remote/local logs\n\n#### 1. [Set config.yml](#1-set-log-sources-and-log-type-in-configyml)\n\n#### 2. Stream target logs via SSH/exec/Kubernetes API ( `hrv stream` )\n\n``` console\n$ hrv stream -c config.yml --with-timestamp --with-host --with-path --with-tag\n```\n\n### :beetle: Copy remote/local raw logs\n\n#### 1. [Set config.yml](#1-set-log-sources-and-log-type-in-configyml)\n\n#### 2. Copy remote/local raw logs to local directory via SSH/exec ( `hrv cp` )\n\n``` console\n$ hrv cp -c config.yml\n```\n\n### --tag filter operators\n\nThe following operators can be used to filter targets\n\n`not`, `and`, `or`, `!`, `\u0026\u0026`, `||`\n\n``` console\n$ hrv stream -c config.yml --tag='webproxy or db' --with-timestamp --with-host --with-path\n```\n\n#### `,` is converted to ` or `\n\n``` console\n$ hrv stream -c config.yml --tag='webproxy,db'\n```\n\nis converted to\n\n``` console\n$ hrv stream -c config.yml --tag='webproxy or db'\n```\n\n### --source filter\n\nfilter targets using source regexp\n\n``` console\n$ hrv fetch -c config.yml --source='app-[0-9].example'\n```\n\n## Architecture\n\n### `hrv fetch` and `hrv cat`\n\n![img](doc/fetch.png)\n\n### `hrv stream`\n\n![img](doc/stream.png)\n\n## Installation\n\n```console\n$ brew install k1LoW/tap/harvest\n```\n\nor\n\n```console\n$ go get github.com/k1LoW/harvest/cmd/hrv\n```\n\n## What is \"middle-scale system\"?\n\n- \u003c 50 instances\n- \u003c 1 million logs per `hrv fetch`\n\n### What if you are operating a large-scale/super-large-scale/hyper-large-scale system?\n\nLet's consider agent-base log collector/platform, service mesh and distributed tracing platform!\n\n## Internal\n\n- [harvest-*.db database schema](doc/schema)\n\n## Requirements\n\n- UNIX commands\n  - date\n  - find\n  - grep\n  - head\n  - ls\n  - tail\n  - xargs\n  - zcat\n- sudo\n- SQLite\n\n## WANT\n\n- tag DAG\n- Viewer / Visualizer\n\n## References\n\n- [Hayabusa](https://github.com/hirolovesbeer/hayabusa): A Simple and Fast Full-Text Search Engine for Massive System Log Data\n    - Make simple with a combination of commands.\n    - Full-Text Search Engine using SQLite FTS.\n- [stern](https://github.com/wercker/stern): ⎈ Multi pod and container log tailing for Kubernetes\n    - Multiple Kubernetes log streaming architecture.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk1low%2Fharvest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fk1low%2Fharvest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk1low%2Fharvest/lists"}