{"id":15440055,"url":"https://github.com/santhosh-tekuri/logflow","last_synced_at":"2025-10-15T15:25:05.548Z","repository":{"id":64304178,"uuid":"201968265","full_name":"santhosh-tekuri/logflow","owner":"santhosh-tekuri","description":"Fast and Lightweight Log processor and forwarder for Kubernetes","archived":false,"fork":false,"pushed_at":"2021-01-15T14:50:00.000Z","size":3886,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-19T20:54:17.832Z","etag":null,"topics":["elasticsearch","forwarder","kubernetes","log","logging"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/santhosh-tekuri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-12T16:29:00.000Z","updated_at":"2022-12-27T09:50:23.000Z","dependencies_parsed_at":"2023-01-15T10:15:25.605Z","dependency_job_id":null,"html_url":"https://github.com/santhosh-tekuri/logflow","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/santhosh-tekuri/logflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhosh-tekuri%2Flogflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhosh-tekuri%2Flogflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhosh-tekuri%2Flogflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhosh-tekuri%2Flogflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/santhosh-tekuri","download_url":"https://codeload.github.com/santhosh-tekuri/logflow/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/santhosh-tekuri%2Flogflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279086423,"owners_count":26100139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-15T02:00:07.814Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","forwarder","kubernetes","log","logging"],"created_at":"2024-10-01T19:10:42.612Z","updated_at":"2025-10-15T15:25:05.500Z","avatar_url":"https://github.com/santhosh-tekuri.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Logflow\n\nLogflow exports kubernetes pod logs to Elasticsearch.  \nThis project goal is use minimum cpu(1 to 2%) and minimum memory, in comparison to other solutions.  \nIt is written in golang and is lightweight.\n\n## How it works\n\n- watches for any changes to log files in `/var/log/containers`. this directory contains symlinks\n  to docker log files\n- if new log file appears in `/var/log/containers`, resolves to its realpath\n- it creates hardlink to the log file in `/var/log/containers/logflow` directory\n- when docker rotates log file, it creates hardlink to new log file in `/var/log/containers/logflow`\n- because we create hardlinks to log files, no additional disk space is required by logflow, \n  other than few metadata files in `/var/log/containers/logflow`\n- a new goroutine is started for each pod, which parses the log files in `/var/log/containers/logflow` \n  and exports to elastic search.\n- once a logfile completely exported, it is deleted from `/var/log/containers/logflow`\n- thus if elasticsearch is reachable, logflow should use only diskspace only for small metadata files.\n- in case, elasticsearch is down, we keep deleting old logfile from `/var/log/containers/logflow`\n  when docker rotates new logfile. you can configure how many additional logfiles can be stored \n  other than what docker keeps on disk with `maxFiles` property in `logflow.conf` \n  \n## Quickstart\n\nMake sure that kubernetes nodes are using docker [json-file](https://docs.docker.com/config/containers/logging/json-file/) logging driver.\nyou can check this in `/etc/docker/daemon.json` file.\n\nTo use `json-file` logging driver create `/etc/docker/daemon.json` with below content and restart docker.\n\n```json\n{\n  \"log-driver\": \"json-file\",\n  \"log-opts\": {\n    \"max-size\": \"10m\",\n    \"max-file\": \"3\" \n  }\n}\n```\n\n**NOTE:**  \n- `max-file` in `/etc/docker/daemon.json` must be greater than `1`, if it is `1`, then\n`logflow` cannot detect log rotation(because docker trucates the file to rotate)\n- make sure `compress` in `/etc/docker/daemon.json` is `disabled` (by default it is disabled)\n\nclone this project, and edit `kustomize/logflow.conf`\n- update `elasticsearch.url`\n- update `json-file.max-file` to same value as in `/etc/docker/daemon.json`\n- leave other options to their defaults\n\nnow deploy logflow into namespace `logflow`:\n\n```shell\n$ kubectl create ns logflow\nnamespace/logflow created\n\n$ kubectl apply -k kustomize\nserviceaccount/logflow created\nclusterrole.rbac.authorization.k8s.io/logflow created\nclusterrolebinding.rbac.authorization.k8s.io/logflow created\nconfigmap/logflow-ff5k2b2t4d created\nservice/elasticsearch created\nservice/kibana created\ndeployment.apps/elasticsearch created\ndeployment.apps/kibana created\ndaemonset.apps/logflow created\n```\n\nthe logs are exported to elasticsearch indexes with format `logflow-yyyy-mm-dd`.  \nall log records has 3 mandatory fields: `@timestamp`, `@message` and `@k8s`\n- `@timestamp` is in RFC3339 Nano format\n- `@message` is log message\n- `@k8s` is json object with fields:\n    - `namespace` namespace of pod\n    - `pod` name of the pod\n    - `container_name` name of the container\n    - `container_id` docker container id\n        - you can see a specific pod instance logs in kibana, by applying filter on this field\n    - `nodename` name of node on which it is running\n    - `labels` json object of labels\n        - if label name contains `.` it is replaced with `_`\n\nyou can add additions fields such as loglevel, threadname etc to log record, by configuring log parsing as explained below. \n\n\nhow to parser a pod logs, is specified by adding annotation `logflow.io/parser` on pod.\n\nto parse log using [regex](https://github.com/google/re2/wiki/Syntax) format:\n```yaml\nannotations:\n  logflow.io/parser: |-\n    format=/^\\[(?P\u003ctimestamp\u003e.*?)\\] (?P\u003cmessage\u003e.*)$/\n    message_key=message\n    timestamp_key=timestamp\n    timestamp_layout=Mon Jan _2 15:04:05 MST 2006\n    multiline_start=/^\\[(?P\u003ctime\u003e.*?)\\] /\n```\n- regex must be enclosed in `/`\n\n- you can test your regex [here](https://play.golang.org/p/J7NJr_nTskK)\n    - edit `line` and `expr` in the opened page and click `Run`\n- group names `?P\u003cGROUPNAME\u003e` in regex will map to log record field names\n- `message_key` is mandatory. it allows to replace `@message` value in log record with the specified regex group match\n- `timestamp_key` allows to replace `@timestamp` value in log record with the specified regex group match\n    - `timestamp_layout` specified time format based on reference time \"Mon Jan 2 15:04:05 -0700 MST 2006\"\n    - see [this](https://medium.com/@simplyianm/how-go-solves-date-and-time-formatting-8a932117c41c) to understand time_layout format\n- `multiline_start` is regexp pattern for start line of multiple lines. this is useful if log message can extend to more than one line.\n   the loglines which do not match this regexp are treated as part of recent log message. note that regexp in `format` is matched only \n   on the first line, not on complete multiline log message.\n\n\nto parse log using json format:\n```yaml\nannotations:\n  logflow.io/parser: |-\n    format=json\n    message_key=message\n    timestamp_key=time\n    timestamp_layout=Mon Jan _2 15:04:05 MST 2006\n```\n\n- `message_key` is mandatory. it allows to replace `@message` value in log record with the specified json field value\n- `timestamp_key` allows to replace `@timestamp` value in log record with the specified json field value\n    - `timestamp_layout` specified time format based on reference time \"Mon Jan 2 15:04:05 -0700 MST 2006\"\n    - see [this](https://medium.com/@simplyianm/how-go-solves-date-and-time-formatting-8a932117c41c) to understand time_layout format\n- top level non-string fields are suffixed with their json type. consider an example where one pod log has\n  `error` field with string value and another pod log has `error` field with object having more details. in \n  such cases, elasticsearch throws `mapper_parsing_exception`. to avoid this, logflow renames the `error` field\n  with object value to `error$obj`. this avoids mapping exceptions to large extent without additional manual \n  configuration\n\nto exclude logs of a pod:\n```yaml\nannotations:\n  logflow.io/exclude: \"true\"\n```\n\nNote that the annotation value is boolean which can take a `true` or `false` and must be quoted.\n\nIf pod has multiple containers with different log format use `logflow.io/parser-CONTAINER` annotation\nto target specific container. For example to target container named `nginx`, use annotation `logflow.io/parser-nginx`\n\nsimilarly to exclude logs from specific container use `logflow.io/exclude-CONTAINER` annotation\n\nNOTE:\n\n- logflow does not watch for changes to annotation `logflow.io/parser`\n- logflow reads this annotation only when pod is deployed\n- so any changes to this annotation, after pod is deployed are not reflected\n\n## Performance\n\nAs per my tests, for 10k messages per second:\n- logflow takes 1 to 2% cpu\n- fluentd takes 30 to 40% cpu\n- fluent-bit takes 4-5% cpu\n\nBelow are the instructions to run performance tests to compare logflow with fluentd.  \n\nMake sure that you have minimum 4G memory on each kubernetes node. \nbecause we are deploying elasticsearch and kibana.\n\nMake sure that kubernetes nodes are using docker [json-file](https://docs.docker.com/config/containers/logging/json-file/) logging driver.\nyou can check this in `/etc/docker/daemon.json` file.\n\nTo use `json-file` logging driver create `/etc/docker/daemon.json` with below content and restart docker.\n\n```json\n{\n  \"log-driver\": \"json-file\",\n  \"log-opts\": {\n    \"max-size\": \"10m\",\n    \"max-file\": \"3\" \n  }\n}\n```\n\n**NOTE:**  `max-file` in `/etc/docker/daemon.json` must be greater than `1`, if it is `1`, then\n`logflow` cannot detect log rotation(because docker trucates the file to rotate)\n\n\nclone this project\n\nselect a worker node say `worker2` for peformance test and `perf` label:\n\n```shell\n$ kubectl get nodes\nNAME      STATUS   ROLES    AGE   VERSION\nmaster    Ready    master   56d   v1.15.0\nworker1   Ready    \u003cnone\u003e   56d   v1.15.0\nworker2   Ready    \u003cnone\u003e   56d   v1.15.0\n\n$ kubectl label nodes worker2 perf=true\nnode/worker2 labeled\n```\n\n### logflow\n\nedit `kustomize/logflow.conf`\n- update `json-file.max-file` to same value as in `/etc/docker/daemon.json`\n- leave other options to their defaults\n\ninstall `logflow` and wait for pods:\n\n```shell\n$ kubectl apply -k perf/logflow\nnamespace/logflow created\nserviceaccount/logflow created\nclusterrole.rbac.authorization.k8s.io/logflow created\nclusterrolebinding.rbac.authorization.k8s.io/logflow created\nconfigmap/logflow-ff5k2b2t4d created\nservice/elasticsearch created\nservice/kibana created\ndeployment.apps/elasticsearch created\ndeployment.apps/kibana created\ndaemonset.apps/logflow created\n\n$ kubectl -n logflow get po\nNAME                            READY   STATUS    RESTARTS   AGE\nelasticsearch-598959bf8-846l7   1/1     Running   0          17s\nkibana-5d7ff5fd79-vl56h         1/1     Running   0          17s\nlogflow-pwn4s                   1/1     Running   0          17s\nlogflow-vh885                   1/1     Running   0          17s\nlogflow-xw8sn                   1/1     Running   0          17s\n```\n\nnow run counter deployment as follows:\n\nthis deployment has container which produces one log message per millisec.  \nit has replica 10. thus this deployment produces 10k log messages per sec.  \nall pods are launched on perf worker node using `nodeSelector`\n\n```shell\n$ kubectl apply -f perf/counter.yaml\ndeployment.apps/counter created\n```\n\nto see cpu usage of logflow, ssh to perf worker node and run `top` command, press `o` and type `COMMAND=logflow`:  \n\n```shell\nTasks: 123 total,   1 running,  76 sleeping,   0 stopped,   0 zombie\n%Cpu(s):  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st\nKiB Mem :  4039536 total,  2561628 free,   313312 used,  1164596 buff/cache\nKiB Swap:        0 total,        0 free,        0 used.  3481144 avail Mem\n\n  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND\n13951 root      20   0  108572   8236   4648 S   1.3  0.4   0:00.51 logflow\n```\n\nyou can see that `logflow` takes 1-2% cpu\n\nNOTE: at startup it may take more cpu than above because of old logs, but after some time cpu will be low. \n\nlet us brigdown the setup:\n\n```shell\n$ kubectl delete -f perf/counter.yaml\ndeployment.apps \"counter\" deleted\n\n$ kubectl delete -k perf/logflow\nnamespace \"logflow\" deleted\nserviceaccount \"logflow\" deleted\nclusterrole.rbac.authorization.k8s.io \"logflow\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"logflow\" deleted\nconfigmap \"logflow-ff5k2b2t4d\" deleted\nservice \"elasticsearch\" deleted\nservice \"kibana\" deleted\ndeployment.apps \"elasticsearch\" deleted\ndeployment.apps \"kibana\" deleted\ndaemonset.apps \"logflow\" deleted\n```\n\n### fluentd\n\nnow install fluentd and wait for pods:\n\n```shell script\n$ kubectl apply -k perf/fluentd\nnamespace/fluentd created\nserviceaccount/fluentd created\nclusterrole.rbac.authorization.k8s.io/fluentd created\nclusterrolebinding.rbac.authorization.k8s.io/fluentd created\nconfigmap/fluentd-m82mm29m42 created\nservice/elasticsearch created\nservice/kibana created\ndeployment.apps/elasticsearch created\ndeployment.apps/kibana created\ndaemonset.apps/fluentd created\n\n$ kubectl -n fluentd get po -wide\nNAME                            READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES\nelasticsearch-598959bf8-5hbxk   1/1     Running   0          87s   10.244.1.131   worker1   \u003cnone\u003e           \u003cnone\u003e\nfluentd-9lrqt                   1/1     Running   0          87s   10.244.1.130   worker1   \u003cnone\u003e           \u003cnone\u003e\nfluentd-g99xs                   1/1     Running   0          87s   10.244.0.78    master    \u003cnone\u003e           \u003cnone\u003e\nfluentd-z87gs                   1/1     Running   0          87s   10.244.2.125   worker2   \u003cnone\u003e           \u003cnone\u003e\nkibana-5d7ff5fd79-llcq9         1/1     Running   0          87s   10.244.1.129   worker1   \u003cnone\u003e           \u003cnone\u003e\n```\n\nrun counter deployment:\n\n```shell script\n$ kubectl apply -f perf/counter.yaml\ndeployment.apps/counter created\n```\n\nbash into fluentd pod which is running on perf node, to see cpu usage:\n\n```shell script\n$ kubectl -n fluentd exec -it fluentd-z87gs bash\nroot@fluentd-z87gs:/home/fluent# top\ntop - 13:40:16 up  2:28,  0 users,  load average: 6.47, 3.58, 4.34\nTasks:   5 total,   1 running,   4 sleeping,   0 stopped,   0 zombie\n%Cpu(s):  4.5 us, 10.0 sy,  0.0 ni, 85.0 id,  0.2 wa,  0.0 hi,  0.3 si,  0.0 st\nKiB Mem :  4039536 total,  2293152 free,   522056 used,  1224328 buff/cache\nKiB Swap:        0 total,        0 free,        0 used.  3303532 avail Mem\n\n  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND\n   11 root      20   0  263180 100556   9904 S  31.2  2.5   0:08.95 ruby\n    1 root      20   0   15828   1840   1716 S   0.0  0.0   0:00.01 tini\n    6 root      20   0  197328  52904   9256 S   0.0  1.3   0:01.44 ruby\n   21 root      20   0   27944   3864   3364 S   0.0  0.1   0:00.00 bash\n   28 root      20   0   50136   3940   3304 R   0.0  0.1   0:00.00 top\n```\n\nyou can see that `fluentd` takes 30-40% cpu\n\nlet us brigdown the setup:\n\n```shell script\n$ kubectl delete -f perf/counter.yaml\ndeployment.apps \"counter\" deleted\n\n$ kubectl delete -k perf/fluentd\nnamespace \"fluentd\" deleted\nserviceaccount \"fluentd\" deleted\nclusterrole.rbac.authorization.k8s.io \"fluentd\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"fluentd\" deleted\nconfigmap \"fluentd-m82mm29m42\" deleted\nservice \"elasticsearch\" deleted\nservice \"kibana\" deleted\ndeployment.apps \"elasticsearch\" deleted\ndeployment.apps \"kibana\" deleted\ndaemonset.apps \"fluentd\" deleted\n```\n\n### fluent-bit\n\nnow install fluent-bit and wait for pods:\n\n```shell script\n$ kubectl apply -k perf/fluent-bit/\nnamespace/fluent-bit created\nserviceaccount/fluent-bit created\nclusterrole.rbac.authorization.k8s.io/fluent-bit-read created\nclusterrolebinding.rbac.authorization.k8s.io/fluent-bit-read created\nconfigmap/fluent-bit-config-8m7b4c5kth created\nservice/elasticsearch created\nservice/kibana created\ndeployment.apps/elasticsearch created\ndeployment.apps/kibana created\ndaemonset.extensions/fluent-bit created\n```\n\nrun counter deployment:\n\n```shell script\n$ kubectl apply -f perf/counter.yaml\ndeployment.apps/counter created\n```\n\nto see cpu usage of fluent-bit, ssh to perf worker node and run `top` command, press `o` and type `COMMAND=fluent-bit`:\n\n```shell script\n$ top\ntop - 13:32:17 up 16 min,  3 users,  load average: 11.36, 7.43, 3.41\nTasks: 187 total,   8 running, 125 sleeping,   0 stopped,   2 zombie\n%Cpu(s): 29.0 us, 62.3 sy,  0.0 ni,  4.6 id,  2.0 wa,  0.0 hi,  2.1 si,  0.0 st\nKiB Mem :  4039644 total,   618804 free,  2076472 used,  1344368 buff/cache\nKiB Swap:        0 total,        0 free,        0 used.  1704044 avail Mem\n\n  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND\n 8414 root      20   0  144884  10784   6580 R   4.3  0.3   0:11.85 fluent-bit\n```\n\nyou can see that `fluent-bit` takes 4-5% cpu\n\nlet us brigdown the setup:\n\n```shell script\n$ kubectl delete -f perf/counter.yaml\ndeployment.apps \"counter\" deleted\n\n$ kubectl delete -k perf/fluent-bit\nnamespace \"fluent-bit\" deleted\nserviceaccount \"fluent-bit\" deleted\nclusterrole.rbac.authorization.k8s.io \"fluent-bit-read\" deleted\nclusterrolebinding.rbac.authorization.k8s.io \"fluent-bit-read\" deleted\nconfigmap \"fluent-bit-config-8m7b4c5kth\" deleted\nservice \"elasticsearch\" deleted\nservice \"kibana\" deleted\ndeployment.apps \"elasticsearch\" deleted\ndeployment.apps \"kibana\" deleted\ndaemonset.extensions \"fluent-bit\" deleted\n\n$ kubectl label nodes worker2 perf-\nnode/worker2 labeled\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanthosh-tekuri%2Flogflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsanthosh-tekuri%2Flogflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanthosh-tekuri%2Flogflow/lists"}