{"id":50453621,"url":"https://github.com/copyleftdev/anomalyx","last_synced_at":"2026-06-06T06:00:45.830Z","repository":{"id":361483292,"uuid":"1254601865","full_name":"copyleftdev/anomalyx","owner":"copyleftdev","description":"Contract-first anomaly detection across ~30 formats — logs, security telemetry (syslog, CEF/LEEF, EVTX, Suricata EVE, osquery, CloudTrail), network (pcap, NetFlow, DNS), observability (OTLP, Prometheus, journald), data (CSV/JSON/Parquet/SQLite/Avro/ORC). Deterministic Rust CLI; 9 detectors / 7 classes; NIST-validated, mutation-tested.","archived":false,"fork":false,"pushed_at":"2026-06-02T02:46:01.000Z","size":2521,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-04T04:13:34.611Z","etag":null,"topics":["anomaly-detection","cli","data-quality","deterministic","dfir","drift-detection","llm-tools","log-analysis","mutation-testing","netflow","observability","opentelemetry","outlier-detection","pcap","rust","security","siem","syslog","threat-hunting","time-series"],"latest_commit_sha":null,"homepage":"https://dev.to/copyleftdev/ai-tools-need-contracts-not-prompts-5ca3","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/copyleftdev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-30T19:22:07.000Z","updated_at":"2026-06-02T02:45:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"53b2d00d-a105-4d5b-aa99-14cda8c74f36","html_url":"https://github.com/copyleftdev/anomalyx","commit_stats":null,"previous_names":["copyleftdev/anomalyx"],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/copyleftdev/anomalyx","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Fanomalyx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Fanomalyx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Fanomalyx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Fanomalyx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/copyleftdev","download_url":"https://codeload.github.com/copyleftdev/anomalyx/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/copyleftdev%2Fanomalyx/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33930311,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-05T02:00:06.157Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","cli","data-quality","deterministic","dfir","drift-detection","llm-tools","log-analysis","mutation-testing","netflow","observability","opentelemetry","outlier-detection","pcap","rust","security","siem","syslog","threat-hunting","time-series"],"created_at":"2026-06-01T01:04:34.689Z","updated_at":"2026-06-05T05:00:49.587Z","avatar_url":"https://github.com/copyleftdev.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# anomalyx\n\n[![crates.io](https://img.shields.io/crates/v/anomalyx.svg)](https://crates.io/crates/anomalyx)\n[![CI](https://github.com/copyleftdev/anomalyx/actions/workflows/ci.yml/badge.svg)](https://github.com/copyleftdev/anomalyx/actions/workflows/ci.yml)\n[![License: MIT OR Apache-2.0](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg)](#license)\n\nContract-first anomaly detection over arbitrary corpora — a CLI built on the\nthesis of [*AI Tools Need Contracts, Not Prompts*][article]: **the executable is\nthe contract.**\n\nanomalyx meets your data where it already lives. Point it at **~30 formats** —\nlogs, security telemetry, packet captures, flow records, observability streams,\nspreadsheets, and data-lake files — and it normalizes each into one typed record\nmodel, runs a battery of deterministic anomaly detectors, and returns a dense,\nversioned, machine-readable envelope an agent can trust — not pretty text it has\nto scrape.\n\n[article]: https://dev.to/copyleftdev/ai-tools-need-contracts-not-prompts-5ca3\n\n## The four-verb contract\n\n```text\nanomalyx describe                    # protocol metadata: what this is, formats, detectors\nanomalyx schema                      # JSON Schema of scan output (validate, don't guess)\nanomalyx scan [--baseline B] [PATH]  # normalize + detect → dense tq1 envelope\nanomalyx explain \u003cHANDLE\u003e [PATH]     # resolve a finding handle back to its evidence\n```\n\nWith `--baseline B`, the current corpus is compared against `B`: distributional\ndrift and schema-diff detectors activate. Without it they report honest absence\n(\"no baseline provided\"), and only single-corpus detectors run.\n\nWith `--period N`, rows are treated as an ordered time series and the contextual\n(seasonal-subseries) detector compares each point to its phase peers. Without a\nperiod it reports honest absence — seasonality is never guessed.\n\nWith `--cadence COL`, column `COL` is read as event times and assessed for\nmetronomic (automated) regularity. Without it the cadence detector is absent —\nwhich column means \"time\" is never guessed.\n\nWith `--columns C,..` (or its complement `--exclude C,..`) detection is scoped to\na chosen set of columns — the answer to identifier noise on wide corpora (e.g.\n`journalctl -o json | anomalyx scan --exclude JOB_ID,_PID,__REALTIME_TIMESTAMP`).\nThe scope is explicit, never a heuristic guess; an unknown column name is a hard\nerror so a typo can't silently scan nothing. See [scan modes](docs/src/modes.md).\n\nExit codes are part of the contract: **`0`** clean · **`1`** anomalies found ·\n**`2`** tool error.\n\n```console\n$ printf 'id,amount\\n1,10\\n2,11\\n3,9\\n4,10\\n5,12\\n6,11\\n7,10\\n8,9\\n9,9999\\n' | anomalyx scan\n{\"protocol\":\"anomalyx/tq1\",...,\"rows\":[[0,1,2,1.0,3,4544.43,4]],...,\"exit\":1}\n\n$ ... | anomalyx explain cell:amount:8\n{\"evidence\":{\"kind\":\"cell\",\"column\":\"amount\",\"row\":8,\"value\":{\"t\":\"int\",\"v\":9999}}, \"findings\":[...]}\n```\n\n## Design commitments (straight from the article)\n\n- **Typed dense output** — a versioned `tq1` envelope with a dictionary-pinned\n  string table and dense finding rows. Field changes are API changes.\n- **Determinism is UX** — order-independent (Kahan/Neumaier) reductions, no\n  wall-clock in the measurement path, a config-version fingerprint. Same input +\n  same fingerprint ⇒ byte-identical output.\n- **Honest absence** — a detector that can't run is recorded in `absent` with a\n  reason; a `Null` never silently becomes `0.0`; an unresolved handle fails\n  cleanly with exit `2`.\n- **Handle-based evidence** — `scan` stays compact; `explain` drills in.\n\n## Formats\n\n**32 built-in parsers** across five domains — each an independent plugin, each\nlowered to the same typed `RecordSet`:\n\n- **Tabular \u0026 structured** — CSV, TSV, NDJSON, JSON, YAML, TOML/INI, XML\n- **Columnar, data-lake \u0026 databases** — Parquet, Arrow IPC, Avro, ORC,\n  Excel/ODS, SQLite\n- **Logs \u0026 observability** — logfmt, web access logs, syslog (RFC 3164/5424),\n  systemd journal, Prometheus/OpenMetrics, OpenTelemetry (OTLP)\n- **Security telemetry** — Zeek, CEF/LEEF, auditd, EVTX (Windows Event Log),\n  Suricata/Zeek EVE, osquery, AWS CloudTrail\n- **Network** — PCAP/PCAPNG, NetFlow/IPFIX (nfdump CSV), AWS VPC Flow Logs,\n  DNS query logs\n\nSeveral parsers compute the features the detectors want — DNS query-name entropy\n\u0026 length, flow `duration`, span durations, normalized timestamps — and rename\ncryptic source fields to a canonical schema. So the same taxonomy lights up\nacross domains: **beaconing/C2** via `cadence` on PCAP inter-arrival times,\n**DGA/exfil** via `point` on DNS name entropy, **config drift** via\n`struct.schema` on YAML/TOML, **exfil** via `mv.mahalanobis` on NetFlow\n(bytes, packets, duration), **alert-type drift** via `dist.chi2` on EVE/CEF.\n\nResolution is by extension first, then deterministic content sniff (binary magic\nbefore text signatures); an unrecognized stream is an explicit error, never a\nguess. Binary/heavyweight parsers sit behind default-on feature flags, so\n`--no-default-features` yields a lean, text-only normalizer. Full table:\n[docs › Input \u0026 normalization](https://copyleftdev.github.io/anomalyx/formats.html).\n\n## Architecture\n\n```\ncrates/\n  ax-core       contract types: RecordSet, anomaly taxonomy, tq1 envelope,\n                handles, deterministic reductions  (no heavy deps — the contract\n                stays engine-independent and the mutation gate stays fast)\n  ax-normalize  any input format → RecordSet  (32 parser plugins — text via a\n                lean deterministic reader, binary/library-backed formats behind\n                default-on feature flags — all lowered to the same RecordSet so\n                detectors never see a library type. See \"Formats\" below)\n  ax-detect     Detector trait + registry; detection math assembled from\n                statrs, not reinvented\n  anomalyx      the four-verb CLI surface (the installable crate / binary)\n```\n\nInstall: `cargo install anomalyx`.\n\n## Anomaly taxonomy\n\nSeven classes, so an agent reasons about the *kind* of deviation:\n\n| Class | Meaning | Status |\n|---|---|---|\n| `point` | value far from its column's distribution (modified z / MAD) | ✅ `point.modz` |\n| `distributional` | the distribution shifted vs. a baseline (KS / PSI / χ²) | ✅ `dist.ks`, `dist.psi`, `dist.chi2` |\n| `structural` | schema / type / null-rate violation, baseline schema-diff | ✅ `struct.schema` |\n| `contextual` | anomalous only in context (seasonal subseries) | ✅ `ctx.seasonal` |\n| `collective` | a subsequence is jointly anomalous (level shift) | ✅ `coll.cusum` |\n| `multivariate` | a row isolated in feature space — breaks the joint structure | ✅ `mv.mahalanobis` |\n| `cadence` | suspiciously *regular* timing (automation) | ✅ `cad.regularity` |\n\n## Build vs. assemble\n\nDetection *math* is largely solved and is reused where it fits: `statrs`\n(distributions, χ² / KS p-values), `polars` (normalization). Where the\ndeterminism gate makes an off-the-shelf algorithm a liability — e.g. an\nisolation forest's RNG fights byte-reproducibility — anomalyx instead uses a\nfully deterministic method (the multivariate detector is Mahalanobis distance\nover a self-contained Cholesky solve, no RNG). What anomalyx *invents* is the\npart no crate provides: the executable contract — the envelope, the taxonomy +\nexplainable detector registry, cross-corpus drift orchestration, and the\ndeterminism guarantees.\n\n## Validation against NIST\n\nBeyond unit/property tests, the math core is checked against the **NIST\nStatistical Reference Datasets (StRD)** — the canonical, certified-to-15-digits\ntruth for univariate statistics. The datasets are vendored under\n`crates/ax-validate/data/strd/` (offline, reproducible) and scored by NIST's own\nlog-relative-error (number of correct significant digits):\n\n- `det::mean` reproduces every certified mean to **≥15 digits**; `det::std_dev`\n  to **≥13** on well-conditioned data.\n- On the `NumAcc3`/`NumAcc4` precision torture tests (mean ≈ 1e6–1e7, spread 0.1)\n  the compensated two-pass holds **8–9 correct digits** where the textbook\n  one-pass variance gets **zero** — a checked demonstration that the determinism\n  design is load-bearing, not decorative.\n\nStress tests add ground-truth anomaly recovery (planted outliers flagged with\nno false positives/negatives), order-independence on real 5000-point data, and\nbyte-identical reproducibility on a 40k-row scan.\n\n## The strong gates\n\nTwo load-bearing test gates, run by `scripts/gates.sh`:\n\n1. **Property-based testing** (`proptest`) — pins invariants across all inputs:\n   shift/scale/permutation invariance and determinism for the point detector,\n   round-trips and idempotence for the contract types.\n2. **Mutation testing** (`cargo-mutants`) — proves those tests have teeth. The\n   gate is **0 surviving mutants** on `src/`. Provably-equivalent mutants are\n   documented individually in `.cargo/mutants.toml`, never blanket-suppressed.\n\n```console\n$ ./scripts/gates.sh          # fmt · clippy -D warnings · tests · mutants==0\n```\n\nCurrent status: workspace builds clean, `clippy -D warnings` passes, all tests\ngreen, and every crate passes the 0-survivor mutation gate. CI\n(`.github/workflows/ci.yml`) runs fmt/clippy/test on every push; the mutation\ngate runs locally via `./scripts/gates.sh` (it's too minutes-expensive for CI).\n\n## License\n\nLicensed under either of [Apache License, Version 2.0](LICENSE-APACHE) or\n[MIT license](LICENSE-MIT) at your option. Unless you explicitly state\notherwise, any contribution intentionally submitted for inclusion in this work\nshall be dual licensed as above, without any additional terms or conditions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcopyleftdev%2Fanomalyx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcopyleftdev%2Fanomalyx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcopyleftdev%2Fanomalyx/lists"}