An open API service indexing awesome lists of open source software.

https://github.com/dennyzhang/challenges-fluent-bit

Deep dive into fluent-bit
https://github.com/dennyzhang/challenges-fluent-bit

fluent-bit logging

Last synced: 3 months ago
JSON representation

Deep dive into fluent-bit

Awesome Lists containing this project

README

          

* org-mode configuration :noexport:
#+STARTUP: overview customtime noalign logdone showall
#+TITLE: Deep Dive Into Fluent-Bit
#+DESCRIPTION:
#+KEYWORDS:
#+AUTHOR: Denny Zhang
#+EMAIL: denny@dennyzhang.com
#+TAGS: noexport(n)
#+PRIORITIES: A D C
#+OPTIONS: H:3 num:t toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t <:t
#+OPTIONS: TeX:t LaTeX:nil skip:nil d:nil todo:t pri:nil tags:not-in-toc
#+EXPORT_EXCLUDE_TAGS: exclude noexport
#+SEQ_TODO: TODO HALF ASSIGN | DONE BYPASS DELEGATE CANCELED DEFERRED
#+LINK_UP:
#+LINK_HOME:
* Summary
#+BEGIN_HTML
linkedin
github
slack



PRs Welcome
#+END_HTML
* Fluent-Bit
File me [[https://github.com/DennyZhang/challenges-fluent-bit/issues][Issues]] or star [[https://github.com/DennyZhang/challenges-fluent-bit][this repo]]

See more challenges from Denny: [[https://github.com/topics/denny-challenges][#denny-challenges]]

** Deep Dive Into Source Code
*** How fluent-bit kubernetes filter works
It calls something like: https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/%s/pods/%s

The logic is like this:

#+BEGIN_EXAMPLE
kubernetes.c:cb_kube_filter ->
kube_meta.c:flb_kube_meta_get/get_and_merge_meta ->
kube_meta.c:get_api_server_info ->
kube_meta.h:FLB_KUBE_API_FMT
#+END_EXAMPLE

https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_meta.h#L54

https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_meta.c#L590

https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_meta.c#L138

https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kubernetes.c#L416

**** How to run it manually
- Get the token for your namespace account
https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/

- Run curl test
#+BEGIN_EXAMPLE
# curl --header "Authorization: Bearer $TOKEN" -k https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/oratos/pods/dummy-input --insecure
curl --header "Authorization: Bearer $TOKEN" -k https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/oratos/pods/dummy-input --insecure
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "dummy-input",
"namespace": "oratos",
"selfLink": "/api/v1/namespaces/oratos/pods/dummy-input",
"uid": "f57ec8e3-9a87-11e8-8dcd-080027dbaae1",
"resourceVersion": "8496",
"creationTimestamp": "2018-08-07T21:22:16Z",
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"metadata\":{\"annotations\":{},\"name\":\"dummy-input\",\"namespace\":\"oratos\"},\"spec\":{\"containers\":[{\"args\":[\"tail\",\"-f\",\"/dev/termination-log\"],\"image\":\"busybox\",\"name\":\"dummy-input\"}]}}\n"
}
},
"spec": {
"volumes": [
{
"name": "default-token-xk9qq",
"secret": {
"secretName": "default-token-xk9qq",
"defaultMode": 420
}
}
],
"containers": [
{
"name": "dummy-input",
"image": "busybox",
"args": [
"tail",
"-f",
"/dev/termination-log"
],
"resources": {

},
"volumeMounts": [
{
"name": "default-token-xk9qq",
"readOnly": true,
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
}
],
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "Always"
}
],
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 30,
"dnsPolicy": "ClusterFirst",
"serviceAccountName": "default",
"serviceAccount": "default",
"nodeName": "minikube",
"securityContext": {

},
"schedulerName": "default-scheduler",
"tolerations": [
{
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 300
},
{
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 300
}
]
},
"status": {
"phase": "Running",
"conditions": [
{
"type": "Initialized",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-08-07T21:22:16Z"
},
{
"type": "Ready",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-08-07T21:22:21Z"
},
{
"type": "PodScheduled",
"status": "True",
"lastProbeTime": null,
"lastTransitionTime": "2018-08-07T21:22:16Z"
}
],
"hostIP": "10.0.2.15",
"podIP": "172.17.0.7",
"startTime": "2018-08-07T21:22:16Z",
"containerStatuses": [
{
"name": "dummy-input",
"state": {
"running": {
"startedAt": "2018-08-07T21:22:20Z"
}
},
"lastState": {

},
"ready": true,
"restartCount": 0,
"image": "busybox:latest",
"imageID": "docker-pullable://busybox@sha256:cb63aa0641a885f54de20f61d152187419e8f6b159ed11a251a09d115fdff9bd",
"containerID": "docker://66f701a981bc2fa0db08fe9cdaf80468d2f7398c95db34e7502f839a909303d5"
}
],
"qosClass": "BestEffort"
}
}
#+END_EXAMPLE
*** Sample message for fluent-bit kubernetes filter
#+BEGIN_EXAMPLE
key: log, value: key: time, value: 2018-08-8T18:16:26.000979098Z
key: stream, value: stdout
key: time, value: 2018-08-08T18:16.27.002369384Z
key: kubernetes, value: map[pod_name:fluent-bit-n588c namespace_name: oratos pod_id:a987f335-9b36-11e8-9fa9-080027b477ce labels:map[controller-revision-hash:545891415 k8s-app:logging-agent kubernetes.io/cluster-service:true pod-template-generation:1 version:v1] host:minikube container_name:fluent-bit docker_id:5f5bcedc9a98d6c4705632cdb55d9bcb572b7fc80dbb1da3e440d092d56ea4f5]
#+END_EXAMPLE
*** For tail_file input plugin of fluent-bit, could I specify file path like: /var/log/*/apache/*.log?
It eventually use *glob* function of C. So probably, yes.

tail_scan.c:flb_tail_scan -> tail_scan.c:do_glob

https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_scan.c#L150

https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_scan.c#L202
*** How fluent-bit notice a new log file creation or deletion?
It rescan the log folder every 10 seconds
#+BEGIN_EXAMPLE
DennyZhang [11:40 PM]
Hi XXX

May I ask one question about tail input plugin in fluent-bit

We’re using below input to parse k8s pod log files.
```[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10```

Whenever developers start or delete a pod, /var/log/containers will create or delete a log file.

*We are wondering what the behavior fluent-bit would be*
```1. Whether we need to restart or reload fluent-bit, if that happens
2. What's the latency when the log file are created or deleted, but fluent-bit haven't detected.```

Checking the code, here is my understanding. Would you help us to confirm?
```1. When tail input plugin loads, it will register a timer to rescan the log folder. (in_tail_init function in tail.c. https://github.com/fluent/fluent-bit/blob/d2d3d363c6f852f155d45566e1c4155024327913/plugins/in_tail/tail.c#L221-L225)
2. The rescan interval is every 60 seconds (https://github.com/fluent/fluent-bit/blob/d2d3d363c6f852f155d45566e1c4155024327913/plugins/in_tail/tail.h#L38)
3. Whenever developers create a Pod, the associate log file will be detected after 60 seconds. And no service restart is required
4. Whenever developers delete a Pod, file read will run into exception. Thus no more action will be required. (https://github.com/fluent/fluent-bit/blob/d2d3d363c6f852f155d45566e1c4155024327913/plugins/in_tail/tail_file.c#L658)```

XXX [7:58 PM]
Hi,

if I am not wrong when a Pod get's deleted, the log files for that container persists for a period of time. Now the re-scan of the path to find new logs happens by default every 60 seconds or based in the value of "Refresh_Interval" (which you have set to 10 seconds).
#+END_EXAMPLE
* More Resources
License: Code is licensed under [[https://www.dennyzhang.com/wp-content/mit_license.txt][MIT License]].
#+BEGIN_HTML

linkedin
github
slack
#+END_HTML
* [#A] fluent bit :noexport:
#+BEGIN_EXAMPLE
https://fluentbit.io/documentation/current/installation/docker.html

docker run -t -d --name fluent-bit -p 2020:2020 -p 24224:24224 -v fluent-bit.conf:/etc/fluent-bit.conf --entrypoint=/bin/bash fluent/fluent-bit:0.13

docker exec -it fluent-bit bash

docker cp fluent-test.conf fluent-bit:/root/fluent-test.conf
/fluent-bit/bin/fluent-bit -c /root/fluent-test.conf

docker stop fluent-bit; docker rm fluent-bit
#+END_EXAMPLE
** # --8<-------------------------- separator ------------------------>8-- :noexport:
** HALF fluent bit restart
** HALF how fluent bit monitor dynamic input files
** # --8<-------------------------- separator ------------------------>8-- :noexport:
** TODO fluent bit compatible for promethus
** TODO fluent bit performance metrics
** TODO fluent bit multiple syslog output
** TODO enable fluent-bit buffering with filesystem backend
https://fluentbit.io/documentation/0.13/getting_started/buffer.html
** # --8<-------------------------- separator ------------------------>8-- :noexport:
** TODO verify endpoint restart: whether fluent bit agent will auto healed
** TODO fluent bit: how to detect backend status is error
** # --8<-------------------------- separator ------------------------>8-- :noexport:
** DONE hello world with docker
CLOSED: [2018-06-27 Wed 00:35]
*** [#A] run as a container
https://fluentbit.io/documentation/current/installation/docker.html

docker run -t -d --name fluent-bit -p 2020:2020 -p 24224:24224 -v fluent-bit.conf:/etc/fluent-bit.conf --entrypoint=/bin/bash fluent/fluent-bit:0.13

docker exec -it fluent-bit bash

docker cp fluent-bit.conf fluent-bit:/root/fluent-bit.conf
/fluent-bit/bin/fluent-bit -c /root/fluent-bit.conf

docker stop fluent-bit; docker rm fluent-bit
*** simple test
docker pull fluent/fluent-bit:0.13
docker run -ti fluent/fluent-bit:0.13 /fluent-bit/bin/fluent-bit -i cpu -o stdout -f 1
*** more customization
**** fluent-bit.conf
cat > fluent-bit.conf << EOF
[SERVICE]
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020

[INPUT]
Name forward

# The Listen interface, by default we listen on all of them
Listen 0.0.0.0

# Default TCP listener port
Port 24224

# Buffer (Kilobytes)
# ------------------
# Specify the size of the receiver buffer. Incoming records
# must be inside this limit. By default 512KB.
Buffer 512000

[OUTPUT]
Name file
Match *
Path /tmp/output.txt
EOF
**** run daemon service
# docker run -p 2020:2020 -p 24224:24224 --name fluent-bit -v /Users/zdenny/Dropbox/private_data/work/vmware/fluent-bit/fluent-bit.conf:/etc/fluent-bit.conf -ti fluent/fluent-bit:0.13 /fluent-bit/bin/fluent-bit -c /etc/fluent-bit.conf
docker run -t -d --name fluent-bit -p 2020:2020 -p 24224:24224 -v /Users/zdenny/Dropbox/private_data/work/vmware/fluent-bit/fluent-bit.conf:/etc/fluent-bit.conf --entrypoint=/bin/bash fluent/fluent-bit:0.13

docker exec -it fluent-bit bash

/fluent-bit/bin/fluent-bit -c /etc/fluent-bit.conf

nc localhost 24224
**** verify health check
https://fluentbit.io/documentation/current/configuration/monitoring.html

curl -s http://127.0.0.1:2020/api/v1/metrics
**** check log
docker exec -it fluent-bit tail /tmp/output.txt
**** destroy
docker stop fluent-bit; docker rm fluent-bit
** BYPASS hello world doens't work: doesn't work well in mac OS
CLOSED: [2018-06-27 Wed 00:33]
cat > fluent-bit.conf << EOF
[SERVICE]
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020

[INPUT]
Name forward

# The Listen interface, by default we listen on all of them
Listen 0.0.0.0

# Default TCP listener port
Port 24224

# Buffer (Kilobytes)
# ------------------
# Specify the size of the receiver buffer. Incoming records
# must be inside this limit. By default 512KB.
Buffer 512000

[OUTPUT]
Name stdout
Match *
EOF

bin/fluent-bit -c fluent-bit.conf
** DONE Concept Routing Tag/Match
CLOSED: [2018-06-26 Tue 23:45]
https://fluentbit.io/documentation/current/getting_started/routing.html

Tag is a human-readable indicator that helps to identify the data source.

If some data have a Tag that don't have a match upon routing time, the data it's deleted.
#+BEGIN_EXAMPLE
[INPUT]
Name cpu
Tag my_cpu

[INPUT]
Name mem
Tag my_mem

[OUTPUT]
Name es
Match my_c*

[OUTPUT]
Name stdout
Match my_mem
#+END_EXAMPLE
** DONE use fluent bit to monitor http endpoint
CLOSED: [2018-06-26 Tue 23:31]
https://fluentbit.io/documentation/current/input/health.html

#+BEGIN_EXAMPLE
zdenny-a01:~ zdenny$ nc -l 8080
zdenny-a01:~ zdenny$
#+END_EXAMPLE

#+BEGIN_EXAMPLE
zdenny-a01:build zdenny$ fluent-bit -i health://127.0.0.1:8080 -o stdout
Fluent-Bit v0.14.0
Copyright (C) Treasure Data

[2018/06/26 23:30:41] [ info] [engine] started (pid=23405)
[2018/06/26 23:30:42] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[2018/06/26 23:30:43] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[2018/06/26 23:30:44] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[2018/06/26 23:30:45] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[0] health.0: [1530081042.000000000, {"alive"=>false}]
[1] health.0: [1530081043.000000000, {"alive"=>false}]
[2] health.0: [1530081044.000000000, {"alive"=>false}]
[3] health.0: [1530081045.000000000, {"alive"=>false}]
[2018/06/26 23:30:47] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[2018/06/26 23:30:48] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[2018/06/26 23:30:49] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[0] health.0: [1530081046.000000000, {"alive"=>true}]
[1] health.0: [1530081047.000000000, {"alive"=>false}]
[2] health.0: [1530081048.000000000, {"alive"=>false}]
[3] health.0: [1530081049.000000000, {"alive"=>false}]
[4] health.0: [1530081050.000000000, {"alive"=>true}]
[2018/06/26 23:30:51] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[2018/06/26 23:30:52] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
[2018/06/26 23:30:53] [error] [io] TCP connection failed: 127.0.0.1:8080 (Connection refused)
^C[engine] caught signal (SIGINT)
[2018/06/26 23:30:53] [ info] [input] pausing health.0
#+END_EXAMPLE
** DONE Enable a customized flent bit plugin
CLOSED: [2018-06-27 Wed 21:05]
https://github.com/fluent/fluent-bit/blob/master/GOLANG_OUTPUT_PLUGIN.md

** DONE fluent bit log error message
CLOSED: [2018-06-28 Thu 22:22]
flb_info("[out_syslog] addr=%s", ctx->addr);

fprintf(stderr, "Here6\n");

flb_debug("[out_kafka_rest] host=%s port=%i",
ins->host.name, ins->host.port);
** DONE fluent-bit log to file
CLOSED: [2018-07-02 Mon 17:12]
/bin/fluent-bit -i cpu -o file -p path=/tmp/output.txt -f 1
** # --8<-------------------------- separator ------------------------>8-- :noexport:
** TODO backpressure
** TODO loadbalancing
** TODO syslog tls
** TODO output retry
** TODO output to syslog
** TODO fluentbit backpressure
https://fluentbit.io/documentation/0.13/configuration/backpressure.html
** TODO fluentbit support client side loadbalancing / failover
https://github.com/fluent/fluent-bit/issues/203#issuecomment-389579836

** useful link
http://edsiper.linuxchile.cl/blog/
** HALF fluent bit multiple output: https://github.com/fluent/fluent-bit/issues/305
github/fluent-bit/conf
** # --8<-------------------------- separator ------------------------>8-- :noexport:
** DONE fluent-bit mechanism: retry when output plugin keeps returning FLB_RETRY
CLOSED: [2018-08-03 Fri 09:59]
Last day, Warren and I were wondering what retry logic fluent-bit enginee would be.

When the engine tries flush one message, but our output plugin returns FLB_RETRY. Maybe the syslog endpoint is unavailable, let's say that.

When it retries, our plugin keep returning FLB_RETRY. What FB will do? Keep retrying, or drop it with some max retries mechanisms?

After checking the source code, it enforces with max retries mechanism. And default max retries is 0.
So it will drop it, when we return FLB_RETRY again.

We can customize the max retries via retry_limit property.

After checking the source code, it enforces with max retries mechanism. But default max retries is 0.

So *it will drop the message*, when we return FLB_RETRY again.

We can customize the max retries via retry_limit property.

1. retry_limit configuration
https://github.com/fluent/fluent-bit/blob/ed4b2f09d68f1b71bed53aaa37a048700adc2b5d/src/flb_output.c#L349-L364
2. When output returns retry, FB engine create a retry task.
https://github.com/fluent/fluent-bit/blob/ed4b2f09d68f1b71bed53aaa37a048700adc2b5d/src/flb_engine.c#L177-L188
3. And eash retry task has attribute of task->retries. And each output plugin has attribute of retry_limit.
https://github.com/fluent/fluent-bit/blob/ed4b2f09d68f1b71bed53aaa37a048700adc2b5d/src/flb_task.c#L88-L138
4. Based on task->retries and plugin->retry_limit, it determines whether keep trying or return error
** TODO [#A] How to integrate fluentd with vRealize Log
** TODO [#A] try kube-fluentd-operator: https://github.com/vmware/kube-fluentd-operator

** TODO reload fluentbit
** TODO try k8s fluentd-operator
** TODO /Users/zdenny/Dropbox/private_data/work/vmware/fluent-bit/plugins/filter_kubernetes/kubernetes.c
** TODO fluentd-kubernetes-daemonset: https://github.com/fluent/fluentd-kubernetes-daemonset
** TODO fluent bit log to multiple endpoints
* fluentd :noexport:
** how fluentd get kubernetes information
https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter
#+BEGIN_EXAMPLE
```{
"log": "2015/05/05 19:54:41 \n",
"stream": "stderr",
"docker": {
"id": "df14e0d5ae4c07284fa636d739c8fc2e6b52bc344658de7d3f08c36a2e804115",
}
"kubernetes": {
"host": "jimmi-redhat.localnet",
"pod_name":"fabric8-console-controller-98rqc",
"pod_id": "c76927af-f563-11e4-b32d-54ee7527188d",
"container_name": "fabric8-console-container",
"namespace_name": "default",
"namespace_id": "23437884-8e08-4d95-850b-e94378c9b2fd",
"namespace_annotations": {
"fabric8.io/git-commit": "5e1116f63df0bac2a80bdae2ebdc563577bbdf3c"
},
"namespace_labels": {
"product_version": "v1.0.0"
},
"labels": {
"component": "fabric8Console"
}
}
}```

At least from this fluentd k8s filter plugin, I don’t see more meta data compared to fluent-bit
Let’s see
#+END_EXAMPLE
* # --8<-------------------------- separator ------------------------>8-- :noexport: