{"id":15137725,"url":"https://github.com/amoeba/arrow-opentelemetry-example","last_synced_at":"2026-02-08T12:31:24.568Z","repository":{"id":66554349,"uuid":"561066371","full_name":"amoeba/arrow-opentelemetry-example","owner":"amoeba","description":"Example of using OpenTelemetry and Apache Arrow","archived":false,"fork":false,"pushed_at":"2024-12-17T19:46:48.000Z","size":113,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-12-17T20:33:54.969Z","etag":null,"topics":["apache-arrow","cpp","open-telemetry"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amoeba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-02T21:46:21.000Z","updated_at":"2024-12-17T19:46:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"cbb2b88d-b27c-4cc2-9d3a-3ee0a1fb87bf","html_url":"https://github.com/amoeba/arrow-opentelemetry-example","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amoeba%2Farrow-opentelemetry-example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amoeba%2Farrow-opentelemetry-example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amoeba%2Farrow-opentelemetry-example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amoeba%2Farrow-opentelemetry-example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amoeba","download_url":"https://codeload.github.com/amoeba/arrow-opentelemetry-example/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230451332,"owners_count":18227901,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-arrow","cpp","open-telemetry"],"created_at":"2024-09-26T07:01:45.169Z","updated_at":"2026-02-08T12:31:19.547Z","avatar_url":"https://github.com/amoeba.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# arrow-opentelemetry-example\n\nWorked example of how to view [OpenTelemetry](https://opentelemetry.io/) traces emitted by [Apache Arrow](https://arrow.apache.org/) using [Jaeger UI](https://www.jaegertracing.io/).\n\n![Screenshot of the Jaeger user interface showing a portion of a table. The table has the header Serivce and Operation and has a four rows that show a nested set of calls to various Arrow and Parquet functions](./images/readme-screenshot.jpeg)\n\n## Pre-requisites\n\nTo follow all of the steps, you will need:\n\n- [git](https://git-scm.com)\n- [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) or another way to set up a [libarrow development environment](https://arrow.apache.org/docs/developers/cpp/index.html)\n- [Docker](https://www.docker.com/) + w/ `docker compose` or equivalent\n\n## Steps\n\nAt a high level, we:\n\n1. Make a custom build of libarrow with `ARROW_WITH_OPENTELEMETRY` enabled\n2. Build pyarrow against that build so we can run a script of interest and produces traces\n3. Start a minimal OpenTelemetry stack (i.e., Jaeger UI, an OpenTelemetry collector) to collect a view those traces\n4. Run our script to produce traces\n5. View the results in Jaeger UI\n\n## Make a custom build of libarrow\n\nPublished distributions of Apache Arrow don't come with OpenTelmetry tracing enabled so we have to check out Apache Arrow from source and build it ourselves.\nThese steps are very similar to the [official documentation](https://arrow.apache.org/docs/developers/cpp/building.html) but all steps are reproduced here for completeness.\n\n```sh\ngit clone https://github.com/apache/arrow\ncd arrow\n```\n\nI use Conda here because it makes it easy spin up a complete and clean build environment:\n\n```sh\nconda create -y -n arrow-tracing \\\n    --channel=conda-forge \\\n    --file ci/conda_env_unix.txt \\\n    --file ci/conda_env_cpp.txt \\\n    --file ci/conda_env_python.txt \\\n    --file ci/conda_env_sphinx.txt \\\n    clang_osx-arm64=14 \\\n    clang-tools=14 \\\n    compilers \\\n    python=3.10\nconda activate arrow-tracing\nexport ARROW_HOME=$CONDA_PREFIX\n```\n\nOnce we're in our checkout of the Apache Arrow source, we need to make a fresh directory for our build.\nWe'll do that under the `cpp` directory at `./cpp/build`.\n\n```sh\ncd cpp\nmkdir build\ncd build\n```\n\nNext, run cmake with with the options required for your use case or use the following.\nNote: It's critical that you set `-DARROW_WITH_OPENTELEMETRY=\"ON\"`.\n\n```sh\ncmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \\\n      -DCMAKE_INSTALL_LIBDIR=lib \\\n      -DARROW_BUILD_INTEGRATION=\"ON\" \\\n      -DARROW_BUILD_STATIC=\"OFF\" \\\n      -DARROW_BUILD_TESTS=\"ON\" \\\n      -DARROW_COMPUTE=\"ON\" \\\n      -DARROW_CSV=\"ON\" \\\n      -DARROW_DATASET=\"ON\" \\\n      -DARROW_WITH_BZ2=ON \\\n      -DARROW_EXTRA_ERROR_CONTEXT=\"ON\" \\\n      -DARROW_FILESYSTEM=\"ON\" \\\n      -DARROW_JSON=\"ON\" \\\n      -DARROW_MIMALLOC=\"ON\" \\\n      -DARROW_PARQUET=\"ON\" \\\n      -DARROW_WITH_BROTLI=\"ON\" \\\n      -DARROW_WITH_BZ2=\"ON\" \\\n      -DARROW_WITH_LZ4=\"ON\" \\\n      -DARROW_WITH_RE2=\"ON\" \\\n      -DARROW_WITH_SNAPPY=\"ON\" \\\n      -DARROW_WITH_UTF8PROC=\"ON\" \\\n      -DARROW_WITH_ZLIB=\"ON\" \\\n      -DARROW_WITH_ZSTD=\"ON\" \\\n      -DCMAKE_BUILD_TYPE=\"Debug\" \\\n      -DGTest_SOURCE=BUNDLED \\\n      -DARROW_WITH_OPENTELEMETRY=\"ON\" \\\n  ..\n```\n\nCompile and install into our Conda environment:\n\n```sh\nmake -j8\nmake install\n```\n\n## Build pyarrow\n\nWhile we could produce traces in any environment that uses libarrow, for ease of setup, we will use [PyArrow](https://arrow.apache.org/docs/python).\nThe steps here are very similar to the [official documentation](https://arrow.apache.org/docs/developers/python.html) but are included here for completeness.\n\n\nFirst, `cd` into the `python` subdirectory of your Apache Arrow checkout:\n\n```\ncd ../..\ncd python\n```\n\nSet `PYARROW_PARALLEL=8` so our build uses more than one core:\n\n```\nexport PYARROW_PARALLEL=8\n```\n\nTo get the script we're going to run below to work, we'll need to customize our build to support the [Datsaet API](https://arrow.apache.org/docs/python/dataset.html):\n\n```\nexport PYARROW_WITH_DATASET=1\n```\n\nLast, build PyArrow and install into the Conda environment:\n\n```\npython setup.py build_ext --inplace\npython -m pip install -e . --no-build-isolation\n```\n\nAt this point, our Conda environment has our custom build of libarrow and a build of PyArrow that knows how to use it.\n\n## Start a minimal OpenTelemetry stack\n\nStart up a minimal OpenTelemetry stack with:\n\nBack up to the root of this repository:\n\n```sh\ncd ../..\n```\n\nThen start up the OpenTelemetry stack:\n\n```sh\ndocker compose up -d\n```\n\nNote: This may take some time depending on the speed of your network connection.\n\n## Run our script\n\nWhen we run our PyArrow code, libarrow will look for a few environment variables to enable tracing and configure it appropriately for our setup.\n\nThe first enables exporting OpenTelemetry traces over HTTP:\n\n```sh\nexport ARROW_TRACING_BACKEND=otlp_http\n```\n\nNote: This is documented in the [Environment Variables](https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_TRACING_BACKEND) documentation.\n\nThe last two (1) configure where our traces will get exported and (2) gives our script a recognizable name.\n\n```sh\nexport OTEL_EXPORTER_OTLP_ENDPOINT=\"http://localhost:4318\"\nexport OTEL_RESOURCE_ATTRIBUTES=\"service.name=myservice\"\n```\n\nThe script we're going to run is a simple example taken from the [PyArrow documentation](https://arrow.apache.org/docs/python/compute.html#table-and-dataset-joins).\n\n```python\n# From https://arrow.apache.org/docs/python/compute.html#table-and-dataset-joins\nimport pyarrow as pa\n\ntable1 = pa.table({'id': [1, 2, 3],\n                   'year': [2020, 2022, 2019]})\n\ntable2 = pa.table({'id': [3, 4],\n                   'n_legs': [5, 100],\n                   'animal': [\"Brittle stars\", \"Centipede\"]})\n\njoined_table = table1.join(table2, keys=\"id\")\n\n```\n\nTo run the script:\n\n```sh\npython example.py\n```\n\n## View the results in Jaeger UI\n\nIf everything is set up correctly, your script should have executed, sent off some traces, and exited successfully.\n\nNow, just visit http://localhost:16686 in a web browser of your choosing and you should see \"myservice\" listed under Service.\n\n![Screenshot of the Jaeger UI showing two tabs at the top labeled Search and JSON File. The Search tab is selected. Below it there is a list of services labeled \"jaeger-query\" and \"myservice\" in a dropdown.](./images/jaeger-services-dropdown.png)\n\nFrom there, click \"Find Traces\" to see your traces.\nOnce you find a trace to view, you should see something like this:\n\n![Full screenshot of the Jaeger UI showing a detailed view of a trace](./images/jaeger-trace-screenshot.jpeg)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famoeba%2Farrow-opentelemetry-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famoeba%2Farrow-opentelemetry-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famoeba%2Farrow-opentelemetry-example/lists"}