{"id":46388768,"url":"https://github.com/smithclay/otlp2records","last_synced_at":"2026-05-30T21:01:01.308Z","repository":{"id":332045348,"uuid":"1132341722","full_name":"smithclay/otlp2records","owner":"smithclay","description":"shared library for converting OTLP data into records for parquet, flattened json, avro, clickhouse, etc","archived":false,"fork":false,"pushed_at":"2026-05-27T01:17:44.000Z","size":941,"stargazers_count":4,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-27T01:20:53.902Z","etag":null,"topics":["opentelemetry","parquet"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smithclay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-11T19:26:53.000Z","updated_at":"2026-05-17T21:58:43.000Z","dependencies_parsed_at":"2026-01-12T16:09:34.053Z","dependency_job_id":null,"html_url":"https://github.com/smithclay/otlp2records","commit_stats":null,"previous_names":["smithclay/otlp2records"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/smithclay/otlp2records","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithclay%2Fotlp2records","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithclay%2Fotlp2records/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithclay%2Fotlp2records/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithclay%2Fotlp2records/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smithclay","download_url":"https://codeload.github.com/smithclay/otlp2records/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithclay%2Fotlp2records/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33709269,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["opentelemetry","parquet"],"created_at":"2026-03-05T08:13:28.045Z","updated_at":"2026-05-30T21:01:01.296Z","avatar_url":"https://github.com/smithclay.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# otlp2records\n\n[![Crates.io](https://img.shields.io/crates/v/otlp2records.svg)](https://crates.io/crates/otlp2records)\n\nTransform OTLP telemetry (logs, traces, metrics) into Arrow RecordBatches.\n\nA high-performance, WASM-compatible library for converting OpenTelemetry Protocol (OTLP) data to Apache Arrow format for efficient storage and querying.\n\nCurrently consumed by [duckdb-otlp](https://github.com/smithclay/duckdb-otlp), [otlp2parquet](https://github.com/smithclay/otlp2parquet) and [otlp2pipeline](https://github.com/smithclay/otlp2pipeline).\n\n## Design Principles\n\n- **No I/O**: Core never touches network or filesystem\n- **No async**: Pure synchronous transforms\n- **WASM-first**: All dependencies compile to wasm32\n- **Arrow-native**: RecordBatch is the canonical output format\n\n## Features\n\n- Transform OTLP logs, traces, and metrics to Arrow RecordBatches\n- Support for both Protobuf and JSON input formats\n- Output to NDJSON, Arrow IPC, or Parquet\n- Direct OTLP-to-Arrow hot path for high-throughput ingestion\n- JSON/JSONL support through OTLP request normalization into the same Arrow builders\n\n## Installation\n\nAdd to your `Cargo.toml`:\n\n```toml\n[dependencies]\notlp2records = \"0.8\"\n\n# Optional: Enable Parquet output\notlp2records = { version = \"0.8\", features = [\"parquet\"] }\n\n# Optional: Enable WASM bindings\notlp2records = { version = \"0.8\", features = [\"wasm\"] }\n```\n\n## Usage\n\n### Rust API\n\n#### High-level API (Recommended)\n\n```rust\nuse otlp2records::{transform_logs, transform_traces, transform_metrics, InputFormat};\n\n// Transform OTLP logs\nlet bytes: \u0026[u8] = /* OTLP log data */;\nlet batch = transform_logs(bytes, InputFormat::Protobuf)?;\nprintln!(\"Transformed {} log records\", batch.num_rows());\n\n// Transform OTLP traces\nlet batch = transform_traces(bytes, InputFormat::Json)?;\nprintln!(\"Transformed {} spans\", batch.num_rows());\n\n// Transform OTLP metrics (returns separate batches by type)\nlet batches = transform_metrics(bytes, InputFormat::Protobuf)?;\nif let Some(gauge) = batches.gauge {\n    println!(\"Transformed {} gauge metrics\", gauge.num_rows());\n}\nif let Some(sum) = batches.sum {\n    println!(\"Transformed {} sum metrics\", sum.num_rows());\n}\n```\n\n#### Output Formats\n\n```rust\nuse otlp2records::{transform_logs, to_json, to_ipc, InputFormat};\n\nlet batch = transform_logs(bytes, InputFormat::Protobuf)?;\n\n// Output as NDJSON\nlet ndjson: Vec\u003cu8\u003e = to_json(\u0026batch)?;\n\n// Output as Arrow IPC (streaming format)\nlet ipc: Vec\u003cu8\u003e = to_ipc(\u0026batch)?;\n\n// Output as Parquet (requires \"parquet\" feature)\n#[cfg(feature = \"parquet\")]\nlet parquet: Vec\u003cu8\u003e = otlp2records::to_parquet(\u0026batch)?;\n```\n\n### WASM Usage\n\nBuild with the `wasm` feature for browser/Node.js environments:\n\n```bash\ncargo build --target wasm32-unknown-unknown --features wasm\n```\n\n```javascript\nimport init, { transform_logs_wasm } from './otlp2records.js';\n\nawait init();\n\n// Transform OTLP logs (Uint8Array) to Arrow IPC\nconst otlpBytes = new Uint8Array(/* ... */);\nconst arrowIpc = transform_logs_wasm(otlpBytes, \"protobuf\");\n```\n\n## API Overview\n\n### Input Formats\n\n| Format | Description |\n|--------|-------------|\n| `InputFormat::Protobuf` | Standard OTLP protobuf encoding |\n| `InputFormat::Json` | OTLP JSON encoding (camelCase field names) |\n| `InputFormat::Jsonl` | Newline-delimited OTLP JSON envelopes |\n| `InputFormat::Auto` | Auto-detect JSON vs protobuf with fallback decoding |\n\n### High-level Functions\n\n| Function | Description |\n|----------|-------------|\n| `transform_logs(bytes, format)` | Transform OTLP logs to Arrow RecordBatch |\n| `transform_traces(bytes, format)` | Transform OTLP traces to Arrow RecordBatch |\n| `transform_metrics(bytes, format)` | Transform OTLP metrics to MetricBatches |\n\n### Schema Output Selection\n\nThe default output is `SchemaOutput::Normalized`, the flattened ClickStack-compatible\nschema used by the existing `transform_logs`, `transform_traces`, and\n`transform_metrics` APIs. The aliases `\"normalized\"`, `\"clickstack\"`,\n`\"clickstack-mode\"`, `\"\"`, and `\"default\"` all parse to this default.\n\nRust callers can opt into `SchemaOutput::OtapStar` with explicit APIs:\n\n| Function | Description |\n|----------|-------------|\n| `transform_logs_with_schema(bytes, format, schema_output)` | Transform logs to `LogsOutput::Normalized` or `LogsOutput::OtapStar` |\n| `transform_traces_with_schema(bytes, format, schema_output)` | Transform traces to `TracesOutput::Normalized` or `TracesOutput::OtapStar` |\n| `transform_metrics_with_schema(bytes, format, schema_output)` | Transform metrics to `MetricsOutput::Normalized` or `MetricsOutput::OtapStar` |\n\n`otap-star` / `otap_star` emits multi-table Arrow batches modeled after the\nOpenTelemetry otel-arrow data model. Instead of flattened JSON columns such as\n`events_json`, `links_json`, `metric_attributes`, or `exemplars_json`, child\nentities are emitted as separate tables keyed by deterministic `id` and\n`parent_id` columns. Use `iter_named_batches()` on `OtapLogsBatches`,\n`OtapTracesBatches`, or `OtapMetricsBatches` to serialize each named table.\n\nThe FFI and WASM bindings continue to expose the normalized single-batch shape in\nthis release. `otap-star` is Rust API only to avoid changing those ABIs.\n\n### Breaking Changes In 0.8.0\n\nThe 0.7 to 0.8 release intentionally changes the default normalized schema. The\nexisting `transform_logs`, `transform_traces`, and `transform_metrics` APIs still\nreturn flattened batches by default, but downstream code that selects columns by\nname or expects specific Arrow physical types must be updated.\n\nKey normalized-schema changes:\n\n- OTLP/OTAP field names replace older ClickStack-style names: for example,\n  `timestamp` becomes `time_unix_nano` for logs and metrics,\n  trace `timestamp` becomes `start_time_unix_nano`, `span_name` becomes `name`,\n  `span_kind` becomes `kind`, and metric `metric_name`/`metric_description`/\n  `metric_unit` become `name`/`description`/`unit`.\n- Timestamps now use Arrow `Timestamp(Nanosecond)` instead of microsecond or\n  millisecond-scaled integer columns. Span duration is\n  `duration_time_unix_nano` with Arrow `Duration(Nanosecond)`.\n- Trace and span identifiers are Arrow `FixedSizeBinary(16)` and\n  `FixedSizeBinary(8)` instead of hex strings.\n- Metric number values are split into nullable `int_value` and `double_value`\n  columns instead of a single `Float64` `value` column.\n- Histogram bucket columns now use typed Arrow list columns instead of JSON\n  strings, and dropped counts/flags/count fields use unsigned Arrow integer\n  types where OTAP does.\n\nThe flattened JSON convenience columns remain for now: `resource_attributes`,\n`scope_attributes`, signal attribute JSON columns, `events_json`, `links_json`,\nand `exemplars_json`. The new `otap-star` output is the more relational\nmulti-table shape for callers that want child tables instead of flattened JSON.\n\n### Transform Observation\n\nProduction callers can opt into phase timings and counters without changing output semantics:\n\n| Function | Description |\n|----------|-------------|\n| `transform_logs_with_observer(bytes, format, observer)` | Transform logs and report decode/build/append/finalize phases |\n| `transform_traces_with_observer(bytes, format, observer)` | Transform traces and report decode/build/attribute JSON/append/finalize phases |\n| `transform_metrics_with_observer(bytes, format, observer)` | Transform metrics and report decode/capacity/context/append/finalize phases |\n\nImplement `TransformObserver` to receive `TransformPhaseTiming` and `TransformCounterValue`\nevents. Counters include duplicate resource/scope context hits and misses plus repeated\nresource/scope attribute row-copy counts and bytes.\n\nTo observe an OTAP star transform, use the `*_with_schema_and_observer` entry\npoints, which route to either schema and thread the observer through:\n\n| Function | Description |\n|----------|-------------|\n| `transform_logs_with_schema_and_observer(bytes, format, schema_output, observer)` | Logs transform with both schema selection and observer |\n| `transform_traces_with_schema_and_observer(bytes, format, schema_output, observer)` | Traces transform with both schema selection and observer |\n| `transform_metrics_with_schema_and_observer(bytes, format, schema_output, observer)` | Metrics transform with both schema selection and observer |\n\nThe OTAP path emits the same phase enum (`ProtobufDecode`, `JsonDecode`,\n`JsonlDecode`, `BuilderInit`, `ResourceLogsBuild` / `ResourceSpansBuild` /\n`ResourceMetricsBuild`, the matching `Scope*Build` and per-record\n`LogRecordBuild` / `SpanBuild` / `MetricBuild`, and `ArrowFinalize`) plus the\n`OutputRows`, `Resource/ScopeContextDuplicateHit`, and\n`Resource/ScopeContextDuplicateMiss` counters. The\n`Resource/ScopeAttributesRowCopies*` counters are normalized-only — OTAP\nemits attributes as their own child tables, so no row replication happens.\n\n### Output Functions\n\n| Function | Description |\n|----------|-------------|\n| `to_json(\u0026batch)` | Convert RecordBatch to NDJSON bytes |\n| `to_ipc(\u0026batch)` | Convert RecordBatch to Arrow IPC format |\n| `to_parquet(\u0026batch)` | Convert RecordBatch to Parquet (requires feature) |\n\nThese serializers operate on one `RecordBatch` at a time. For `otap-star`, call\nthem per table by iterating named batches.\n\n### Schemas\n\n| Function | Description |\n|----------|-------------|\n| `logs_schema()` | Arrow schema for log records |\n| `traces_schema()` | Arrow schema for trace spans |\n| `gauge_schema()` | Arrow schema for gauge metrics |\n| `sum_schema()` | Arrow schema for sum metrics |\n\n## Architecture\n\n```\n                              +-------------------+\n                              |   OTLP Input      |\n                              | (Protobuf / JSON) |\n                              +---------+---------+\n                                        |\n                                        v\n                              +---------+---------+\n                              |   Format Dispatch |\n                              | (protobuf/jsonl)  |\n                              +---------+---------+\n                                        |\n                                        v\n                              +---------+---------+\n                              | OTLP Request      |\n                              | (prost structs)   |\n                              +---------+---------+\n                                        |\n                                        v\n                              +---------+---------+\n                              | Arrow Builders    |\n                              | (direct columns)  |\n                              +---------+---------+\n                                        |\n                                        v\n                              +---------+---------+\n                              |   RecordBatch     |\n                              +---------+---------+\n                                        |\n                  +---------------------+---------------------+\n                  |                     |                     |\n                  v                     v                     v\n          +-------+-------+     +-------+-------+     +-------+-------+\n          |    NDJSON     |     |   Arrow IPC   |     |    Parquet    |\n          +---------------+     +---------------+     +---------------+\n```\n\n### Public Surface\n\n- **transform functions**: Convert OTLP logs, traces, and metrics to Arrow batches\n- **schema functions**: Return the Arrow schemas used by the transform functions\n- **partition helpers**: Group transformed batches by service\n- **output helpers**: Serialize RecordBatches to NDJSON, Arrow IPC, or Parquet\n- **wasm**: WASM bindings (optional)\n\n## Output Schemas\n\n`SchemaOutput::Normalized` is the default flattened schema. In 0.8.0 it uses\nOTAP-compatible field names and high-value Arrow physical types while keeping\nthe flattened resource/scope/attribute convenience columns. The `clickstack`\nand `clickstack-mode` schema aliases still select this normalized output.\n\n### Logs Schema\n\n| Field | Type | Description |\n|-------|------|-------------|\n| time_unix_nano | TimestampNanosecond | Log record timestamp |\n| observed_time_unix_nano | TimestampNanosecond | When log was observed |\n| trace_id | FixedSizeBinary(16) | Trace correlation ID |\n| span_id | FixedSizeBinary(8) | Span correlation ID |\n| service_name | String | Service name from resource |\n| service_namespace | String | Service namespace |\n| service_instance_id | String | Service instance ID |\n| severity_number | Int32 | Numeric severity (1-24) |\n| severity_text | String | Severity string (DEBUG, INFO, etc.) |\n| event_name | String | Log event name |\n| body | String | Log message body |\n| resource_attributes | String | JSON-encoded resource attributes |\n| scope_name | String | Instrumentation scope name |\n| scope_version | String | Instrumentation scope version |\n| scope_attributes | String | JSON-encoded scope attributes |\n| log_attributes | String | JSON-encoded log attributes |\n| dropped_attributes_count | UInt32 | Dropped log attributes |\n| flags | UInt32 | Log flags |\n\n### Traces Schema\n\n| Field | Type | Description |\n|-------|------|-------------|\n| start_time_unix_nano | TimestampNanosecond | Span start time |\n| duration_time_unix_nano | DurationNanosecond | Span duration |\n| trace_id | FixedSizeBinary(16) | Trace ID |\n| span_id | FixedSizeBinary(8) | Span ID |\n| parent_span_id | FixedSizeBinary(8) | Parent span ID |\n| trace_state | String | W3C trace state |\n| name | String | Operation name |\n| kind | Int32 | Span kind enum |\n| status_code | Int32 | Status code |\n| status_status_message | String | Status message |\n| service_name | String | Service name from resource |\n| service_namespace | String | Service namespace |\n| service_instance_id | String | Service instance ID |\n| scope_name | String | Instrumentation scope name |\n| scope_version | String | Instrumentation scope version |\n| scope_attributes | String | JSON-encoded scope attributes |\n| span_attributes | String | JSON-encoded span attributes |\n| resource_attributes | String | JSON-encoded resource attributes |\n| events_json | String | JSON-encoded span events |\n| links_json | String | JSON-encoded span links |\n| dropped_attributes_count | UInt32 | Dropped attributes count |\n| dropped_events_count | UInt32 | Dropped events count |\n| dropped_links_count | UInt32 | Dropped links count |\n| flags | UInt32 | Span flags |\n\n### Gauge Metrics Schema\n\n| Field | Type | Description |\n|-------|------|-------------|\n| time_unix_nano | TimestampNanosecond | Data point timestamp |\n| start_time_unix_nano | TimestampNanosecond | Start of measurement window |\n| name | String | Metric name |\n| description | String | Metric description |\n| unit | String | Unit of measurement |\n| int_value | Int64 | Integer metric value |\n| double_value | Float64 | Floating-point metric value |\n| service_name | String | Service name from resource |\n| service_namespace | String | Service namespace |\n| service_instance_id | String | Service instance ID |\n| resource_attributes | String | JSON-encoded resource attributes |\n| scope_name | String | Instrumentation scope name |\n| scope_version | String | Instrumentation scope version |\n| scope_attributes | String | JSON-encoded scope attributes |\n| metric_attributes | String | JSON-encoded metric attributes |\n| flags | UInt32 | Data point flags |\n| exemplars_json | String | JSON-encoded exemplars |\n\n### Sum Metrics Schema\n\nIncludes all gauge fields plus:\n\n| Field | Type | Description |\n|-------|------|-------------|\n| aggregation_temporality | Int32 | 1=Delta, 2=Cumulative |\n| is_monotonic | Boolean | Whether sum is monotonic |\n\n### Histogram Metrics Schema\n\nHistogram metrics use the common metric context fields above, plus `count`\n(`UInt64`), `sum`, `min`, `max`, typed `bucket_counts` (`List\u003cUInt64\u003e`),\ntyped `explicit_bounds` (`List\u003cFloat64\u003e`), `flags`, `exemplars_json`, and\n`aggregation_temporality`.\n\n### Exponential Histogram Metrics Schema\n\nExponential histograms use the common metric context fields above, plus\n`count` (`UInt64`), `sum`, `min`, `max`, `scale`, `zero_count` (`UInt64`),\n`zero_threshold`, typed positive/negative bucket-count lists, `flags`,\n`exemplars_json`, and `aggregation_temporality`.\n\n## Cargo Features\n\n| Feature | Description | Default |\n|---------|-------------|---------|\n| `default` | Core functionality | Yes |\n| `parquet` | Enable Parquet output | No |\n| `wasm` | Enable WASM bindings | No |\n\n## Performance\n\n- Transforms are plain Rust functions with no interpreter or runtime overhead\n- Arc-shared resource/scope values reduce memory allocations\n- Arrow columnar format enables efficient compression\n- Release builds use LTO and size optimization\n\n## License\n\nLicensed under either of:\n\n- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)\n- MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)\n\nat your option.\n\n## Contributing\n\nContributions welcome! Please ensure:\n\n1. All tests pass: `cargo test`\n2. Code is formatted: `cargo fmt`\n3. No clippy warnings: `cargo clippy -- -D warnings`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmithclay%2Fotlp2records","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmithclay%2Fotlp2records","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmithclay%2Fotlp2records/lists"}