Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tel-io/tel

Cloud log framework easily helps to handle 3 log pillars (Logging, tracing and monitoring). All-In-One. Inspired by open telemetry.
https://github.com/tel-io/tel

distributed-logging-system logging monitoring opentelemetry opentracing tracing

Last synced: about 1 month ago
JSON representation

Cloud log framework easily helps to handle 3 log pillars (Logging, tracing and monitoring). All-In-One. Inspired by open telemetry.

Awesome Lists containing this project

README

        

= Telemetry.V2 Otel Protocol

Framework which aims to ease logging affair: `Logs`, `Traces` and `Metrics` .

V2 version launch usage of OpenTelemetry specification for all logging directions.
This mean that all logging propagators uses `OTEL` protocol.

Tel use `zap.Logger` as the heart of system.
That why it's pass all zap functions through.

== Motto

Ony context all logs.

Decrease external dependencies as match as possible.

== Features

.All-In-One
Library establish connection via GRPC OTLP protocol with `opentelemetry-collector-contrib` (official OTEL server) and send `logs`, `traces` and `metrics`.
Collector by the way distribute them to `Loki`, `Tempo` and `Prometheus` or any other services which you prefer and which collector support.
Furthermore, we prepared for you working `dashboards` in `./__grafana` folder which created for our `middlewares` for most popular servers and clients.

.Logs
Our goal to support https://grafana.com/docs/loki/latest/logql/log_queries/#logfmt[logfmt] format for `Loki` viewing.
Via simple `zap` library interface.

By the way, you can enrich log attributes which should be written only when whey would really need

[source,go]
----
// create copy of ctx which we enrich with log attributes
cxt := tel.Global().Copy().Ctx()

// pass ctx in controller->store->root layers and enrich information
err := func (ctx context.Context) error{
tel.FromCtx(ctx).PutAttr(extract...)
return return fmt.Errorf("some error")
}(ctx)

// and you write log message only when it really needed
// with all putted attribute via ctx from ALL layers earlier
// No need to look previous info/debug messages
// All needed information in one message with all attributes which you already added, but which would be writen only when you really do call `Error()`, `Info()`, `Debug()` and so on
//
// for example: only when you got error
if err != nil{
tel.FromCtx(ctx).Error("error happened", tel.Error(err))
}

----

.Trace
Library simplify usage with creating `Spans` of trace

Also, you can send not only logs and also encroach `trace events`

[source,go]
----
span, ctx := tel.StartSpanFromContext(req.Context(), "my trace")
defer span.End()

tel.FromCtx(ctx).Info("this message will be saved both in LogInstance and trace",
// and this will come to the trace as attribute
tel.Int("code", errCode))
----

.Metrics
Simplify working with metrics
[source,go]

----
m := tel.Global().Meter("github.com/MyRepo/MyLib/myInstrumenation")

requestLatency, err := m.SyncFloat64().Histogram("demo_client.request_latency",
instrument.WithDescription("The latency of requests processed"))
if err != nil {
t.Fatal("metric load error", tel.Error(err))
}

...
start := time.Now()
....

ms := float64(time.Now().Sub(start).Microseconds())

requestLatency.Record(ctx, ms,
attribute.String("userID", "e64916d9-bfd0-4f79-8ee3-847f2d034d20"),
attribute.Int("orderID", 1),
)
----

.Middleware

* Recovery flow
* Instantiate new copy of `tel` for further handler
* Basic metrics with respective dashboard for grafana
* Trace propagation
** client part - send (inject) current trace span to the server
** server part - read (extract) trace and create new trace child one (or absolutly new if no trace info was provided or this info where not properly wrapped via propagator protocol of OTEL specification)

== Logging stack

Logging data exported via `OTEL's` GRPC protocol. `tel` developed to trespass it via https://github.com/open-telemetry/opentelemetry-collector[open-telemetry collector] which should route log data up to any desired log receivers.

Keep in mind that collector has plugin version https://github.com/open-telemetry/opentelemetry-collector-contrib[collector contrib] - this is gateway-adapter to numerous protocols which not yet support `OTEL`, for example grafana loki.

For instance, you can use `opentelemetry-collector-contrib` as `tel` receiver and route logging data to `Grafana Loki`, trace data to `Grafana Tempo` and metric data to `Prometheus + Grafana ;)`

=== Grafana references feature

==== loki to tempo

`tel` approach to put `traceID` field with actual trace ID.
All our middlewares should do that or developer should do it by himself

Just call `UpdateTraceFields` before write some logs
[source,go]

----
tel.UpdateTraceFields(ctx)
----

understood grafana should setup `derivedFields` for Loki data source
[source,yaml]

----
- name: Loki
type: loki
url: http://loki:3100
uid: loki
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: "traceID=(\\w+)"
name: trace
url: '$${__value.raw}'
----

==== tempo to loki

We match `tempo` with `loki` by `service_name` label.
All logs should contain traceID by any key form and `service_name`.
In grafana tempo datasource should be configured with `tracesToLogs`

==== prometheus to loki
[source,yaml]

----
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo:3200
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo
jsonData:
nodeGraph:
enabled: true
tracesToLogs:
datasourceUid: loki
filterBySpanID: false
filterByTraceID: true
mapTagNamesEnabled: false
tags:
- service_name
----

== Install

[source,bash]
----
go get github.com/tel-io/tel/v2@latest
----

=== collector

OTEL collector configuration (labels) part of setup, this mean if you not properly setup it - you wouldn't be able to see appropriate result

[source,yaml]
----
include::docker/otel-collector-config.yaml[]
----

== Features

* `OTEL` logs implementation

== Env

.OTEL_SERVICE_NAME
service name

`type`: string

.NAMESPACE
project namespace

`type`: string

.DEPLOY_ENVIRONMENT
ENUM: dev, stage, prod

`type`: string

.LOG_LEVEL
info log

`type`: string
NOTE: debug, info, warn, error, dpanic, panic, fatal

.LOG_ENCODE
valid options: `console` and `json` or "none"

none - disable print to console (only OTEL or critical errors)

.DEBUG
for IsDebug() function

`type`: bool

.MONITOR_ENABLE
default: `true`

.MONITOR_ADDR
address where `health`, `prometheus` would be listen

NOTE: address logic represented in net.Listen description

.OTEL_ENABLE
default: `true`

.OTEL_COLLECTOR_GRPC_ADDR
Address to otel collector server via GRPC protocol

.OTEL_EXPORTER_WITH_INSECURE
With insecure ...

.OTEL_ENABLE_COMPRESSION
default: `true`

Enables gzip compression for grpc connections

.OTEL_METRIC_PERIODIC_INTERVAL_SEC
default: "15"

Interval metrics gathered

.OTEL_COLLECTOR_TLS_SERVER_NAME
Check server certificate DNS name given from server.

Disable `OTEL_EXPORTER_WITH_INSECURE` if set

.LOGGING_OTEL_CLIENT
default: `false`

required `OTEL_ENABLE` = true

Inject logger adapter to otel library related to grpc client and get log information related to this transport

.LOGGING_OTEL_PROCESSOR
default: `false`

required `OTEL_ENABLE` = true

Inject logger adapter to otel processor library related to collectors behaviour

.LOGS_ENABLE_RETRY
default: `false`

Enable retrying to send logs to collector.

.LOGS_SYNC_INTERVAL
default: `1s`

Limit how often logs are flushed with level.Error.

Example: 1s means allowed 1 flush per second if logs have level.Error.

.LOGS_MAX_MESSAGE_SIZE
default: `256`

Limit message size. If limit is exceeded, message is truncated.

.LOGS_MAX_MESSAGES_PER_SECOND
default: `100`

Limit rate of messages per second. If limit is exceeded, warning is logged and messages are dropped. Value 0 disables limit.

.LOGS_MAX_LEVEL_MESSAGES_PER_SECOND
default: ``

The same as LOGS_MAX_MESSAGES_PER_SECOND but allows to configure limit per level.

Value format: =,=. Ex: LOGS_MAX_LEVEL_MESSAGES_PER_SECOND="error=0,info=100"

.TRACES_ENABLE_RETRY
default: `false`

Enable retrying to send traces to collector.

.TRACES_SAMPLER
default: `statustraceidratio:0.1`

Set sampling strategy. There are options:
- never
- always
- traceidratio:
- statustraceidratio:

where is required and valid floating point number from 0.0 to 1.0

.TRACES_ENABLE_SPAN_TRACK_LOG_MESSAGE
default: `false`

Enable adding all log messages to active span as event.

.TRACES_ENABLE_SPAN_TRACK_LOG_FIELDS
default: `true`

Enable adding all log fields to active span as attributes.

.TRACES_CARDINALITY_DETECTOR_ENABLE
default: `true`

Enable cardinality check for span names.

.TRACES_CARDINALITY_DETECTOR_MAX_CARDINALITY
default: `0`

Limit cardinality of span's attributes. Not used, so default value is 0.

.TRACES_CARDINALITY_DETECTOR_MAX_INSTRUMENTS
default: `500`

Limit the number of unique span names.

.TRACES_CARDINALITY_DETECTOR_DIAGNOSTIC_INTERVAL
default: `10m`

Enable diagnostic loop that checks for cardinality violations and logs a warning.

You can disable it by setting the value to 0.

.METRICS_ENABLE_RETRY
default: `false`

Enable retrying to send metrics to collector.

.METRICS_CARDINALITY_DETECTOR_ENABLE
default: `true`

Enable cardinality check for metrics' labels.

.METRICS_CARDINALITY_DETECTOR_MAX_CARDINALITY
default: `100`

Limit cardinality of metric's labels. If limit is exceeded, metric is ignored, but the previous metrics work as before

.METRICS_CARDINALITY_DETECTOR_MAX_INSTRUMENTS
default: `500`

Limit the number of unique metric names (without labels. only name).

.METRICS_CARDINALITY_DETECTOR_DIAGNOSTIC_INTERVAL
default: `10m`

Enable diagnostic loop that checks for cardinality violations and logs a warning.

You can disable it by setting the value to 0.

.OTEL_COLLECTOR_TLS_CA_CERT
TLS CA certificate body

.OTEL_COLLECTOR_TLS_CLIENT_CERT
TLS client certificate

.OTEL_COLLECTOR_TLS_CLIENT_KEY
TLS client key

.OTEL_RESOURCE_ATTRIBUTES
This optional variable, handled by open-telemetry SDK.
Separator is semicolon.
Put additional resources variables, very suitable!

== ToDo

* [ ] Expose health check to specific metric
* [ ] Duplicate trace messages for root - ztrace.New just add to chain tree

== Usage

Tale look in `example/demo` folder.