Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-observability
Awesome observability page
https://github.com/adriannovegil/awesome-observability
Last synced: 37 minutes ago
JSON representation
-
3. Collect
-
Metrics
- OpenLLMetry - Open-source observability for your LLM application, based on OpenTelemetry.
- OpenLLMetry for Javascript - Sister project to OpenLLMetry, but in Typescript. Open-source observability for your LLM application, based on OpenTelemetry.
- OpenLLMetry for Go - Sister project to OpenLLMetry, but in Go. Open-source observability for your LLM application, based on OpenTelemetry.
- Kuberhealthy - Kubernetes operator for synthetic monitoring and continuous process verification.
- ingraind - Security monitoring agent built around RedBPF for complex containerized environments and endpoints.
- ctop - Top-like interface for container metrics.
- cAdvisor - cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers.
- Node-exporter - Prometheus stack, Exporter for machine metrics.
- Tcollector - Data collection framework for OpenTSDB.
- Netflix Vector - An on-host performance monitoring framework which exposes hand picked high resolution metrics to every engineer's browser.
- Express State Metrics - Simple, self-hosted module based on Socket.io and Chart.js to report realtime server metrics for Express-based node servers.
- Kube State Metrics - The kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.
- MyPerf4J - High performance Java APM. Powered by ASM. Try it. Test it. If you feel its better, use it.
- SkyAPM-dotnet - SkyAPM-dotnet provides the native support agent in C# and .NETStandard platform, with the helps from Apache SkyWalking committer team.
- pktvisor - Observability agent for summarizing high volume, information dense data streams down to lightweight, immediately actionable observability data directly at the edge.
- Opentelemetry - OpenTelemetry is made up of an integrated set of APIs and libraries as well as a collection mechanism via an agent and collector.
- top - Allows users to monitor processes and system resource usage on Linux. It is one of the most useful tools in a sysadmin's toolbox, and it comes pre-installed on every distribution.
- htop - Command line utility that allows you to interactively monitor your system's vital resources or server's processes in real time.
- OpenCensus - OpenCensus is a set of libraries for various languages that allow you to collect application metrics and distributed traces, then transfer the data to a backend of your choice in real time.
- Opentracing - Vendor-neutral APIs and instrumentation for distributed tracing.
- Openmetrics - An effort to create an open standard for transmitting metrics at scale, with support for both text representation and Protocol Buffers.
- Micrometer - Micrometer provides a simple facade over the instrumentation clients for the most popular monitoring systems, allowing you to instrument your JVM-based application code without vendor lock-in. Think SLF4J, but for metrics.
- Performance Co-Pilot - Performance Co-Pilot is a system performance analysis toolkit.
- Kamon - Monitoring applications running on the JVM.
- htop - Command line utility that allows you to interactively monitor your system's vital resources or server's processes in real time.
- MyScale Telemetry - Tool designed to enhance the observability of LLM applications by capturing trace data from LangChain-based applications and storing it in MyScaleDB or ClickHouse.
-
Events & Problems
- kubernetes-event-exporter - This tool allows exporting the often missed Kubernetes events to various outputs so that they can be used for observability or alerting purposes.
- kspan - Turning Kubernetes Events into spans.
- KubeEye - KubeEye aims to find various problems on Kubernetes, such as application misconfiguration(using Polaris), cluster components unhealthy and node problems(using Node-Problem-Detector).
-
Tracing
- Sleuth - Spring Cloud Sleuth implements a distributed tracing solution for Spring Cloud, borrowing heavily from Dapper, Zipkin and HTrace.
- inspectIT Ocelot - Java agent for collecting performance, tracing and business data.
-
Logging
- mTAIL - Windows program that extract internal monitoring data from application logs for collection in a timeseries database.
- Elastic Beats - Lightweight shippers for Elasticsearch & Logstash, Elastic stack.
-
-
4. Load Generators and Synthetic Traffic
-
Events & Problems
- Taurus - Taurus relies on JMeter, Gatling, Locust.io, Grinder and Selenium WebDriver as its underlying tools. Free and open source under Apache 2.0 License.
- Vegeta - HTTP load testing tool built out of a need to drill HTTP services with a constant request rate. It can be used both as a command line utility and a library.
- Trunks - Trunks, like every son, is derived from the father Vegeta with some enhanced skills.
- Yandex Tank - Yandex.Tank is an extensible open source load testing tool for advanced linux users which is especially good as a part of an automated load testing suite.
- ghz - Simple gRPC benchmarking and load testing tool inspired by hey and grpcurl.
- Locust - Locust is an easy-to-use, distributed, user load testing tool. It is intended for load-testing web sites (or other systems) and figuring out how many concurrent users a system can handle.
- Pandora - Pandora is a high-performance load generator in Go language. It has built-in HTTP(S) and HTTP/2 support and you can write your own load scenarios in Go, compiling them just before your test.
- GoReplay - Open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data.
- BFG - A modular tool and framework for load generation that supports HTTP/2.
- Bender - Bender makes it easy to build load testing applications for services using protocols like HTTP, Thrift, Protocol Buffers and many more. Bender provides a library of flexible, powerful primitives that can be combined (with plain Go code) to build load testers customized to any use case and that evolve with your service over time.
- JMeter - Java application designed to load test functional behavior and measure performance. It was originally designed for testing Web Applications but has since expanded to other test functions.
- K6 - k6 is a developer-centric, free and open-source load testing tool built for making performance testing a productive and enjoyable experience.
- Gatling - Load test as code.
- phantom - Evgeniy Mamchits' phantom is a very fast (100 000+ RPS) shooter written in C++ (default).
-
-
1. Best Practices
-
10. Application Performance Monitoring Solutions (APM)
-
Anomalies Detection
- DataDog - Unified Monitoring For Metrics, Traces, & Logs.
- Honeycomb - Give all software engineering teams the observability they need to eliminate toil and delight their users.
- NewRelic - Complete view of your applications and operating environment.
- AppDynamics - Business and application performance monitoring.
- Kamon apm - Point and click to find the endpoints, database queries, and API calls that affect your user's experience.
- Netdata - Troubleshoot slowdowns and anomalies in your infrastructure with thousands of per-second metrics, meaningful visualizations, and insightful health alarms with zero configuration.
- Stagemonitor - An open source solution to application performance monitoring for java server applications.
- Checkmk Server - Monitor your entire hybrid IT infrastructure.
- coroot - Open-source eBPF-based observability tool that turns telemetry data into actionable insights, helping you identify and resolve application issues quickly.
- robusta - Unified Kubernetes monitoring, observability, and operations.
- Kloudfuse - Single unified observability platform for metrics, events, logs and traces.
- Lightstep - Monitoring, observability, and incident response for the world's most reliable systems.
- Aspecto - Troubleshoot performance bottlenecks and errors within your microservices.
- chronosphere - Chronosphere develops a scalable, reliable, and customizable monitoring solution built for cloud-native applications.
- catchpoint - From the edge to the cloud, our proactive observability platform gives you the power to fix problems before your users notice.
- Blue Matador - Easiest and fastest way to monitor your cloud environments on the market.
- Aternity - Simplified high-definition APM visibility leveraging Real User Monitoring, Synthetic Monitoring, and OpenTelemetry, that is scalable, easy to use and deploy, and unifies insights across end users, applications, networks, and the cloud-native ecosystem.
- AppOptics - Continuous monitoring built to scale with your applications for less downtime and lower resource usage.
- Pixie - Open source observability tool for Kubernetes applications. Pixie uses eBPF to automatically capture telemetry data without the need for manual instrumentation.
- swagger-stats - API Telemetry and APM.
- SkyWalking - Application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Docker, K8s, Mesos) architectures.
- dynatrace APM - Best-in-class APM from the category leader. Ensure application performance, innovate faster, collaborate efficiently, and deliver more value with dramatically less effort.
- Elastic APM - Application performance monitoring system built on the Elastic Stack.
- Icinga - The Icinga stack spans six core strengths that cover all aspects of monitoring.
- Nagios - Computer system, network and infrastructure monitoring software application.
- Sensu - The Observability Pipeline that delivers monitoring as code on any cloud.
- Kieker - Monitoring, analysis and tool integration.
- servicenow - Cloud Observability - Gain AI-powered insights to detect and quickly respond to changes in cloud-native and monolithic applications.
-
-
15. License
-
Anomalies Detection
-
-
5. Transport
-
Events & Problems
- Vector - Collect, transform, and route all your logs and metrics with one simple tool.
- Redis - Redis is an open source, in-memory data structure store, used as a database, cache and message broker. It supports many different data structures such as stringes, hashes, list, etc.
- Rsyslog - RSYSLOG is the rocket-fast system for log processing.
- Eventuate - A platform for developing asynchronous microservices solving the distributed data management problems.
- Mosca - MQTT broker as a module.
- Nanomsg - Socket library that provides several common communication patterns for building distributed systems.
- NATS - Open source, high-performance, lightweight cloud messaging system.
- Pulsar - Distributed pub-sub messaging system.
- Qpid - Cross-platform messaging components built on AMQP.
- RabbitMQ - Open source Erlang-based message broker that just works.
- VerneMQ - Open source software, extendable, and enterprise support is available.
- Zenoh - A low overhead, low latency, high throughput open-source protocol that blends traditional pub/sub with geo distributed storage, queries and computations for unifying data in motion, data at rest and computations.
-
-
6. Collector
-
Logging
- GoAccess - GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal in \*nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly.
-
-
7. Storage
-
Time Series Database
- VictoriaMetrics - VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database.
- OpenTSDB - OpenTSDB, written in java.
- kairosDB - Fast Time Series Database on Cassandra.
- Graphite - Store numeric time-series data and render graphs of this data on demand.
- M3DB - Fully open source metrics platform built on M3DB, a distributed timeseries database.
- QuestDB - QuestDB is the fastest open source time series database.
- TimescaleDB - PostgreSQL for time‑series.
-
"Meta Projects" (data storage, multi-tenant, aggregation, high availability, etc)
- qryn - Polyglot monitoring and observability.
- Thanos - Highly available Prometheus setup with long term storage capabilities.
- Opstrace - The Opstrace Distribution is a secure, horizontally-scalable, open source observability platform that you can install in your cloud account.
- Apache HBase - Apache HBase is the Hadoop database, a distributed, scalable, big data store.
-
Search Engine
- Apache Lucene - Java library providing powerful indexing and search features.
- Apache Solr - Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene.
- Elasticsearch - Free and Open, Distributed, RESTful Search Engine.
-
SQL Database
- MySQL - Relational database management system.
- MariaDB - Open source relational database.
- PostgreSQL - Open source relational database.
- CockroachDB - CockroachDB delivers Distributed SQL, combining the familiarity of relational data with limitless, elastic cloud scale, bulletproof resilience… and more.
-
NoSQL Database (The Others :-P)
- MongoDB - MongoDB is a document database with the scalability and flexibility that you want with the querying and indexing that you need.
- RethinkDB - RethinkDB pushes JSON to your apps in realtime.
- SQLite - C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.
- CouchDB - Seamless multi-master sync, that scales from Big Data to Mobile, with an Intuitive HTTP/JSON API and designed for Reliability.
-
-
8. Visualization
-
General & Tools
- ExplorViz - Live trace visualization for large software landscapes.
- Flame Graph - Visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.
-
Dashboarding
- Grafana - The first really good dashboard for displaying metrics.
- Uchiwa - Uchiwa is a simple dashboard for the Sensu monitoring framework, built with Go and AngularJS.
- Prometheus - The Prometheus monitoring system and time series database.
-
Tracing
- Jaeger - Monitor and troubleshoot transactions in complex distributed systems.
-
-
9. Processing and Analyze and Act
-
Processing
- Logstash - Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash".
- Fluentd - Fluentd is an open source data collector for unified logging layer.
- Kapacitor - Kapacitor is a real-time streaming data processing engine.
-
Alerts
- Moira - Most powerful alerting system, backed by Graphite.
- Alerta - Tool used to consolidate and de-duplicate alerts from multiple sources for quick ‘at-a-glance’ visualisation.
- Flapjack - Monitoring notification routing & event processing system.
- Cabot - Get alerted when services go down or metrics go crazy.
- Bosun - Time Series Alerting Framework.
-
Anomalies Detection
- Failure Mode and Effects Analysis (FMEA) - Documents current knowledge and actions about the risks of failures, for use in continuous improvement.
- Anomaly Detection Toolkit (ADTK) - Python package for unsupervised / rule-based time series anomaly detection.
-
-
11. Service Mesh
-
Anomalies Detection
- Istio - Istio generates detailed telemetry for all service communications within a mesh.
- Kiali - Observability console for Istio with service mesh configuration capabilities. It helps you to understand the structure of your service mesh by inferring the topology, and also provides the health of your mesh.
-
-
12. Observability as a Service
-
Anomalies Detection
- servicepilot - Modern monitoring platform.
- Mackerel - A SaaS server monitoring service with an intuitive UI, optimized for the cloud era, that fosters a culture of team-based system monitoring/operation.
- Grafana Cloud - Composable observability platform, integrating metrics, traces and logs with Grafana.
- NexClipper - Full stack visibility and intelligence for cloud native applications.
- Sysdig Prometheus - Cloud scale monitoring solution with full Prometheus compatibility.
- CloudWatch - Observability of your AWS resources and applications on AWS and on-premises.
- Google Cloud Monitoring - Gain visibility into the performance, availability, and health of your applications and infrastructure.
- Azure Monitor - Full observability into your applications, infrastructure, and network.
- Guance - China local "All in one" observability platform, it can integrate any open source collecting method.
- Alibaba Cloud Logs Service - Complete real-time data logging service that has been developed by Alibaba Group.
- loggly - See it all in one place. Dozens of log sources, no proprietary agents.
- logiq.ai - Platform to connect observability data from any source to any destination.
- splunk - Extensible data platform powers unified security, full-stack observability and limitless custom applications.
- Tencent Cloud Log Service - Tencent is an internet service portal offering value-added internet, mobile, telecom, and online advertising services.
- Levitate - A Time Series Data Warehouse and Cloud Native Monitoring Solution.
- Epsagon - Application Monitoring Built for Containers and Serverless.
- Instana - IBM® Instana® Observability is the gold standard of incident prevention with automated full-stack visibility, 1-second granularity and 3 seconds to notify.
- sumo logic - Reduce downtime with real-time alerting, dashboards, and machine-learning-powered analytics for all three types of telemetry — logs, metrics, and traces.
- humio - Modern log management with streaming observability and affordable Unlimited Plans.
- Sematext Cloud - Infrastructure and log monitoring with service and log auto-discovery. Basic plan is free. Website uptime, API, and SSL certificate monitoring. Includes status pages and scriptable multi-page user transaction monitoring, etc.
- WhaTap - Provides an integrated monitoring service for DevOps that analyzes application performance issue running on kubernetes in real time.
-
Programming Languages
Categories
3. Collect
33
10. Application Performance Monitoring Solutions (APM)
28
7. Storage
22
12. Observability as a Service
21
4. Load Generators and Synthetic Traffic
14
5. Transport
12
9. Processing and Analyze and Act
10
8. Visualization
6
11. Service Mesh
2
1. Best Practices
2
15. License
1
6. Collector
1
Sub Categories
Anomalies Detection
54
Events & Problems
29
Metrics
26
Time Series Database
7
Alerts
5
NoSQL Database (The Others :-P)
4
SQL Database
4
"Meta Projects" (data storage, multi-tenant, aggregation, high availability, etc)
4
Processing
3
Tracing
3
Logging
3
Dashboarding
3
Search Engine
3
General & Tools
2
Keywords
observability
11
monitoring
10
load-testing
6
metrics
6
kubernetes
5
performance
4
python
4
http
3
prometheus
3
opentelemetry
3
open-source
3
ml
3
llmops
3
agent
3
generative-ai
3
datascience
3
model-monitoring
2
open-telemetry
2
llm
2
javascript
2
prometheus-exporter
2
distributed-tracing
2
apm
2
grpc
2
go
2
benchmarking
2
system-metrics
1
express
1
expressjs
1
system-information
1
procfs
1
node-metrics
1
machine-metrics
1
health-check
1
health-checks
1
middleware
1
monitoring-page
1
node
1
nodejs
1
socket
1
kubernetes-exporter
1
kubernetes-monitoring
1
artifical-intelligence
1
good-first-issue
1
good-first-issues
1
help-wanted
1
opentelemetry-python
1
nextjs
1
opentelemetry-javascript
1
typescript
1