{"id":17468267,"url":"https://github.com/mishmash-io/opentelemetry-server-embedded","last_synced_at":"2026-02-11T13:04:10.709Z","repository":{"id":228560294,"uuid":"772921132","full_name":"mishmash-io/opentelemetry-server-embedded","owner":"mishmash-io","description":"OpenTelemetry and logs, metrics, traces, profiles analytics, united by mishmash io.","archived":false,"fork":false,"pushed_at":"2026-01-30T12:22:18.000Z","size":1279,"stargazers_count":33,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-01-31T05:26:03.881Z","etag":null,"topics":["apache","druid","java","opentelemetry","otlp","parquet","superset","vertx"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mishmash-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-03-16T08:41:07.000Z","updated_at":"2026-01-30T12:22:16.000Z","dependencies_parsed_at":"2024-09-09T13:29:53.301Z","dependency_job_id":"a262594b-f1de-4fde-80ff-a4fd601c25c3","html_url":"https://github.com/mishmash-io/opentelemetry-server-embedded","commit_stats":null,"previous_names":["mishmash-io/opentelemetry-server-embedded"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/mishmash-io/opentelemetry-server-embedded","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mishmash-io%2Fopentelemetry-server-embedded","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mishmash-io%2Fopentelemetry-server-embedded/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mishmash-io%2Fopentelemetry-server-embedded/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mishmash-io%2Fopentelemetry-server-embedded/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mishmash-io","download_url":"https://codeload.github.com/mishmash-io/opentelemetry-server-embedded/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mishmash-io%2Fopentelemetry-server-embedded/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28944823,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T13:02:32.153Z","status":"ssl_error","status_checked_at":"2026-01-31T13:00:07.528Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","druid","java","opentelemetry","otlp","parquet","superset","vertx"],"created_at":"2024-10-18T15:05:42.746Z","updated_at":"2026-02-11T13:04:10.701Z","avatar_url":"https://github.com/mishmash-io.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenTelemetry and Apache Big Data, United by mishmash io\n\n[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/mishmash-io/opentelemetry-server-embedded/badge)](https://scorecard.dev/viewer/?uri=github.com/mishmash-io/opentelemetry-server-embedded)\n\n\nThis repository contains code that receives and adapts [OpenTelemetry](https://opentelemetry.io/) signals - like `logs`, `metrics`, `traces` and `profiles` - to Open Source projects of the [Apache](https://www.apache.org/) analytics ecosystem.\n\n**Blend** and **bundle** them to build your own **Observability analytics backends:**\n- for batch processing with Apache Spark or Hive\n- for real-time analytics with Apache Druid and Apache Superset\n- for Machine Learning and AI\n\nYou will also find additional tools, examples and demos that might be of service on your own OpenTelemetry journey.\n\n\u003e [!TIP]\n\u003e This is a public release of code we have accumulated internally over time and so far contains only a limited subset of what we intend to share.\n\u003e\n\u003e Examples of internal software that will be published here in the near future include:\n\u003e \n\u003e - A small OTLP server based on [Apache BookKeeper](https://bookkeeper.apache.org/) for improved\n\u003e   data ingestion reliability, even across node failures\n\u003e - OpenTelemetry Data Sources for [Apache Pulsar](https://pulsar.apache.org/) for when more\n\u003e   more complex preprocessing is needed\n\u003e - Our [Testcontainers](https://testcontainers.com/) implementations that you can use to\n\u003e   ensure your apps always produce the necessary telemetry, or to track performance across\n\u003e   releases\n\u003e\n\u003e Watch this repository for updates.\n\n***Contents:***\n\n- [How OpenTelemetry compares to other telemetry software](#why-you-should-switch-to-opentelemetry)\n- [Introduction to OpenTelemetry for Developers, Data Engineers and Data Scientists](#opentelemetry-for-developers-data-engineers-and-data-scientists)\n- [When and where should you use the code here](#when-and-where-should-you-use-the-software-in-this-repository)\n- [Software artifacts to:](#artifacts)\n  - [Embed OTLP collectors in Java systems](#embeddable-collectors)\n  - [Save OpenTelemetry to Apache Parquet files](#apache-parquet-stand-alone-server)\n  - [Ingest OpenTelemetry into Apache Druid](#apache-druid-otlp-input-format)\n  - [Visualize OpenTelemetry with Apache Superset](#apache-superset-charts-and-dashboards)\n- [More about OpenTelemetry at mishmash io](#opentelemetry-at-mishmash-io)\n\n# Why you should switch to OpenTelemetry\n\nIf you are new to OpenTelemetry you might be thinking how is it better than the multitude of\nexisting telemetry implementations, many of which are already available or well established within\npopular runtimes like Kubernetes, for example.\n\nThere are a number of advantages that OpenTelemetry offers compared to earlier telemetries:\n\n- All signal types (`logs`, `metrics`, `traces` and `profiles`) are ***correlatable:***\n  \n  For exmpale - you can explore ***only*** the `logs` emitted inside a given (potentially failing) `span`.\n\n  To see how `telemetry signal correlation` works - refer to the [OpenTelemetry for Developers, Data Engineers and Data Scientists](#opentelemetry-for-developers-data-engineers-and-data-scientists) examples section below.\n- More precise timing:\n  \n  Unlike other telemetries, OpenTelemetry does not `pull` data, it `pushes` it. By avoiding the\n  extra request needed to pull data - OpenTelemetry reports much more accurate timestamps of\n  when `logs` or `spans` and other events where emitted, or `metrics` values were updated.\n- Zero-code telemetry:\n  \n  You can add telemetry to your existing apps without any code modifications. If you're using\n  popular frameworks - they already have OpenTelemetry instrumentation that will just work out\n  of the box. See the [OpenTelemetry docs for your programming language.](https://opentelemetry.io/docs/languages/)\n\n  Also, you do not need to implement special endpoints and request handlers to supply telemetry.\n- No CPU overhead if telemetry is not emitted:\n  \n  When code instrumented with OpenTelemetry runs ***without*** a configured signals exporter\n  (basically when it is disabled) - all OpenTelemetry API methods are basically empty.\n\n  They do not perform any operations, thus not requiring any CPU. \n- Major companies already support OpenTelemetry:\n  \n  Large infrastructure providers - public clouds like Azure, AWS and GCP already seamlessly integrate their monitoring and observability services with OpenTelemetry.\n  \n  Instrumenting your code with OpenTelemetry means it can be monitored on any of them, without\n  code changes.\n\nIf the above sounds convincing - keep reading through this document and explore the links in it.\n\n# OpenTelemetry for Developers, Data Engineers and Data Scientists\n\nWe have prepared a few Jupyter notebooks that visually explore OpenTelemetry data that we collected from [a demo Astronomy webshop app](https://github.com/mishmash-io/opentelemetry-demos)\nusing the [Apache Parquet Stand-alone server](./server-parquet) contained in this repository.\n\n\u003e [!TIP]\n\u003e If you are the sort of person who prefers to learn by looking at **actual data** - start with the [OpenTelemetry Basics Notebook.](./examples/notebooks/basics.ipynb)\n\n# When and where should you use the software in this repository\n\nWe, at [mishmsah io,](https://mishmash.io/) have been using OpenTelemetry for quite some time - recording telemetry from experiments, unit and integration tests - to ensure every new release\nof software we develop is performing better than the last, and within reasonable computing-resource usage. (More on this [here.](https://mishmash.io/open_source/opentelemetry))\n\n\u003e [!TIP]\n\u003e OpenTelemetry is great for **monitoring software in production,** but we believe you should adopt it within your **software development process** too.\n\nHaving been through that journey ourselves, we've realised that success depends on strong analytics. OpenTelemetry provides a number of tools to [instrument your code](https://opentelemetry.io/docs/concepts/instrumentation/) to emit signals, and then to compose data transmission pipelines for these signals. And leaves it to you to decide what you ultimately want to do with your signals: where you want to store them depends on how you will work with them.\n\nYou can compose such pipelines for signals transmition using the [OpenTelemetry Collector,](https://opentelemetry.io/docs/collector/) which in turn uses a network protocol called [OTLP.](https://opentelemetry.io/docs/specs/otel/protocol/) At the end - you have to `terminate` the pipelines into an `observability (or OTLP) backend.`\n\nAs a network protocol, OTLP is great at reducing the number of bytes transmitted, keeping the throughput high with minimum overhead. It does this by heavily `nesting` its messages - to avoid\ndata duplication and take maximum advantage of `dictionary encodings` and data compression.\n\nOn the **analytics side** though - heavily nested structures are not optimal. A simple `count(*)` or\n`sum()` query, done over millions of OTLP messages, will have to `unnest` each one of them. Every time you run that query.\n\nAnd this is the second reason why we believe you might find the software here useful:\n\n\u003e [!TIP]\n\u003e When doing analytics on your observability data - you need a suitable data schema.\n\u003e\n\u003e The tools in this repository convert OTLP messages into a 'flatter' schema, that's more suitable\n\u003e for analytics.\n\u003e\n\u003e They preform transformations, **only once** - on **OTLP packet reception,** to minimize the overhead that would otherwise be incurred every time you run an analytics job or query.\n\nFollowing are quick introductions of the individual software packages, where you can find more information.\n\n\u003e [!TIP]\n\u003e If you're wondering how to get your first OpenTelemetry data sets - check out [our fork of OpenTelemetry's Demo app.](https://github.com/mishmash-io/opentelemetry-demos)\n\u003e\n\u003e In there you will find complete deployments that will generate signals, save them and let you play with the data - by writing your own notebooks or creating\n\u003e Apache Superset dashboards.\n\u003e \n\n# Artifacts\n\n## Embeddable collectors\n\nThe base artifact - `collector-embedded` contains classes that handle the OTLP protocol (over both gRPC and HTTP).\n- [README](./collector-embedded)\n- [Javadoc on javadoc.io](https://javadoc.io/doc/io.mishmash.opentelemetry/collector-embedded)\n\n## Apache Parquet Stand-alone server\n\nThis artifact contains a runnable OTLP-protocol server that receives signals from OpenTelemetry and saves them into [Apache Paruqet](https://parquet.apache.org/) files.\n\nIt is not intended for production use, but rather as a quick tool to save and explore OpenTelemetry data locally. [The Basics Jupyter Notebook](./examples/notebooks/basics.ipynb) explores\nParquet files as saved by this Stand-alone server.\n- [README](./server-parquet)\n- [Javadoc on javadoc.io](https://javadoc.io/doc/io.mishmash.opentelemetry/server-parquet)\n- [Quick deployment with a demo app](https://github.com/mishmash-io/opentelemetry-demos)\n\n## Apache Druid OTLP Input Format\n\nUse this artifact when ingesting OpenTelemetry signals into [Apache Druid](https://druid.apache.org), in combination with an Input Source (like Apache Kafka or other).\n\nApache Druid is a high performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load. This makes it perfect for OpenTelemetry data analytics.\n\nWith this OTLP Input Format you can build OpenTelemetry ingestion pipelines into Apache\nDruid. For example:\n- Use the [OpenTelemetry Kafka Exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/kafkaexporter/README.md) to publish\nOTLP signals to an Apache Kafka topic, then the [Druid Kafka Ingestion](https://druid.apache.org/docs/latest/ingestion/kafka-ingestion/) with this Input Format to get Druid\ntables with your telemetry.\n- In a similar way you can also use other Druid input sources developed by mishmash io -\nlike with [Apache BookKeeper](https://bookkeeper.apache.org) or [Apache Pulsar](https://pulsar.apache.org). For details - check the related artifact documentation.\n\nFind out more about the OTLP Input Format for Apache Druid:\n- [README](./druid-otlp-format)\n- [Javadoc on javadoc.io](https://javadoc.io/doc/io.mishmash.opentelemetry/druid-otlp-format)\n- [Quick deployment with a demo app and Apache Superset](https://github.com/mishmash-io/opentelemetry-demos)\n\n## Apache Superset charts and dashboards\n\n![superset-dashboard](https://github.com/user-attachments/assets/8dba1e13-bcb3-41c9-ac40-0c023a3825c8)\n\n[Apache Superset](https://superset.apache.org/) is an open-source modern data exploration and visualization platform.\n\nYou can use its rich visualizations, no-code viz builder and its powerful SQL IDE to build your own OpenTelemetry analytics.\n\nTo get you started, we're publishing [data sources and visualizations](./superset-visualizations) that you can import into Apache Superset.\n\n- [Quick deployment with a demo app](https://github.com/mishmash-io/opentelemetry-demos)\n  \n# OpenTelemetry at mishmash io\n\n[![GitHub followers](https://img.shields.io/github/followers/mishmash-io)](https://github.com/mishmash-io) [![Bluesky posts](https://img.shields.io/bluesky/posts/mishmash.io)](https://bsky.app/profile/mishmash.io) [![GitHub Discussions](https://img.shields.io/github/discussions/mishmash-io/about?logo=github\u0026logoColor=white)](https://github.com/orgs/mishmash-io/discussions) [![Discord](https://img.shields.io/discord/1208043287001169990?logo=discord\u0026logoColor=white)](https://discord.gg/JqC6VMZTgJ)\n\nOpenTelemetry's main intent is the observability of production environments, but at [mishmash io](https://mishmash.io) it is part of our software development process. By saving telemetry from  **experiments** and **tests** of \nour own algorithms we ensure things like **performance** and **resource usage** of our distributed database, continuously and across releases.\n\nWe believe that adopting OpenTelemetry as a software development tool might be useful to you too, which is why we decided to open-source the tools we've built.\n\nLearn more about the broader set of [OpenTelemetry-related activities](https://mishmash.io/open_source/opentelemetry) at\n[mishmash io](https://mishmash.io/) and `follow` [GitHub profile](https://github.com/mishmash-io) for updates and new releases.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmishmash-io%2Fopentelemetry-server-embedded","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmishmash-io%2Fopentelemetry-server-embedded","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmishmash-io%2Fopentelemetry-server-embedded/lists"}