{"id":49737250,"url":"https://github.com/qazaq-ai/adam","last_synced_at":"2026-06-11T08:00:53.720Z","repository":{"id":356692372,"uuid":"1203520715","full_name":"qazaq-ai/adam","owner":"qazaq-ai","description":"Neurosymbolic AI on Kazakh agglutinative morphology — typed Composition → Frame → QueryIR → FrameIndex → realiser pipeline. 0% GPU, 0 MB model, 791 ns warm latency on one CPU core. Architecturally hallucination-free within curated-domain coverage. ~30 catalogued agglutinative languages. Pure Rust, BUSL-1.1.","archived":false,"fork":false,"pushed_at":"2026-06-07T11:57:58.000Z","size":315625,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-07T12:23:15.029Z","etag":null,"topics":["agglutinative","ai","cpu-only","deterministic-ai","edtech","explainable-ai","forward-chaining","frame-semantics","fst","kazakh","kazakh-language","kazakh-nlp","knowledge-graph","llm-alternative","neurosymbolic","rust","symbolic-ai","tutor","typed-fact-graph","typed-pipeline"],"latest_commit_sha":null,"homepage":"https://github.com/qazaq-ai/adam","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qazaq-ai.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-07T05:40:15.000Z","updated_at":"2026-06-07T11:58:00.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/qazaq-ai/adam","commit_stats":null,"previous_names":["qazaq-ai/adam"],"tags_count":562,"template":false,"template_full_name":null,"purl":"pkg:github/qazaq-ai/adam","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qazaq-ai%2Fadam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qazaq-ai%2Fadam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qazaq-ai%2Fadam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qazaq-ai%2Fadam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qazaq-ai","download_url":"https://codeload.github.com/qazaq-ai/adam/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qazaq-ai%2Fadam/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34188272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agglutinative","ai","cpu-only","deterministic-ai","edtech","explainable-ai","forward-chaining","frame-semantics","fst","kazakh","kazakh-language","kazakh-nlp","knowledge-graph","llm-alternative","neurosymbolic","rust","symbolic-ai","tutor","typed-fact-graph","typed-pipeline"],"created_at":"2026-05-09T10:28:35.346Z","updated_at":"2026-06-11T08:00:53.713Z","avatar_url":"https://github.com/qazaq-ai.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/shanraq.svg\" alt=\"adam logo\" width=\"128\" height=\"128\"\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eadam\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ci\u003eNeurosymbolic agglutinative AI — typed, deterministic, watch-class fast.\u003c/i\u003e\u003cbr\u003e\n  \u003ci\u003eKazakh-first applied demonstrator of the Qazaq IR architecture.\u003c/i\u003e\u003cbr\u003e\n  \u003ci\u003eҚазақ тіліне арналған, толық болжамды диалог жүйесі — таза Rust тілінде.\u003c/i\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cb\u003eWhy this project exists →\u003c/b\u003e \u003ca href=\"docs/MANIFESTO.md\"\u003e\u003ccode\u003edocs/MANIFESTO.md\u003c/code\u003e\u003c/a\u003e\u003cbr\u003e\n  \u003cb\u003ev6.2 architecture →\u003c/b\u003e \u003ca href=\"docs/v6_2_architectural_redesign.md\"\u003e\u003ccode\u003edocs/v6_2_architectural_redesign.md\u003c/code\u003e\u003c/a\u003e\u003cbr\u003e\n  \u003cb\u003eHow adam compares to LLMs →\u003c/b\u003e \u003ca href=\"docs/COMPARISON.md\"\u003e\u003ccode\u003edocs/COMPARISON.md\u003c/code\u003e\u003c/a\u003e\u003cbr\u003e\n  \u003cb\u003eDue-diligence pack →\u003c/b\u003e \u003ca href=\"DUE_DILIGENCE.md\"\u003e\u003ccode\u003eDUE_DILIGENCE.md\u003c/code\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003c!-- ─────────────────────────────────────────────────────────────\n     v6.3 honest split (2026-06-02 codex audit follow-up).\n     Production = v6.2 on `main` (numbers below).\n     Research-demo = v6.3 on experimental/v6_3_phonemic_foundation.\n     ───────────────────────────────────────────────────────────── --\u003e\n\n\u003e **Two tracks, on purpose.**\n\u003e\n\u003e - **`main` — v6.2 deterministic core.**  Rule-based dialog kernel:\n\u003e   morph → frame → QueryIR → retrieval → realiser, plus rc15 safety\n\u003e   guard and rc18 OOD discipline.  **314 MB peak RSS, 0 GPU, 0\n\u003e   network, byte-deterministic.**  Blind eval: **97 / 100** on the\n\u003e   curated Kazakh battery.  Production binaries (voice REPL,\n\u003e   `adam_blind_eval`, `adam_chat`) opt in via `ADAM_V6_2=1`; the\n\u003e   Rust library default stays OFF pending a v6.2 sibling test\n\u003e   suite — the rc19/rc20 default-flip attempt surfaced ~20\n\u003e   v6.1-cascade-specific regression tests\n\u003e   (live_holdout_*, factual_eval_100, end_to_end self-intro /\n\u003e   cross-slot / border-fact, adversarial_dialog_v1,\n\u003e   curriculum_v4995_*) whose assertions check v6.1 wording rather\n\u003e   than v6.2 behaviour.  Migrating those is a v6.6+ arc.\n\u003e\n\u003e - **`tools/voice_repl_v6_3` — v6.3 voice surface.**\n\u003e   Wraps the v6.2 core in a microphone → STT → fuzzy / LM rescoring →\n\u003e   intent classifier → router → TTS loop.  **Honest hybrid:** the\n\u003e   dialog core stays 100 % deterministic; neural components (Whisper.cpp\n\u003e   STT, Piper TTS, ~1 M-param BPE LM, ~1 M-param intent classifier)\n\u003e   live only at the speech surface and never invent facts.\n\u003e\n\u003e See \u003ca href=\"DUE_DILIGENCE.md\"\u003e\u003ccode\u003eDUE_DILIGENCE.md\u003c/code\u003e\u003c/a\u003e\n\u003e for current test totals, repo state, known limitations and\n\u003e reproducibility commands across both tracks.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/qazaq-ai/adam/releases\"\u003e\u003cimg src=\"https://img.shields.io/badge/version-6.5.0--rc25-2EA44F?style=for-the-badge\" alt=\"version\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/qazaq-ai/adam/actions/workflows/rust.yml\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/qazaq-ai/adam/rust.yml?branch=main\u0026style=for-the-badge\u0026label=CI\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-BUSL%201.1-orange?style=for-the-badge\" alt=\"license\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/language-Rust-CE412B?style=for-the-badge\u0026logo=rust\u0026logoColor=white\" alt=\"rust\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey?style=for-the-badge\" alt=\"platform\"\u003e\n  \u003ca href=\"https://github.com/qazaq-ai/adam/commits/main\"\u003e\u003cimg src=\"https://img.shields.io/github/last-commit/qazaq-ai/adam?style=for-the-badge\" alt=\"last commit\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/qazaq-ai/adam/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/qazaq-ai/adam?style=for-the-badge\" alt=\"stars\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/tests-2500%20passing%20%2F%200%20failed%20%2F%2027%20ignored-2EA44F?style=flat-square\" alt=\"tests\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/dialog%20battery-79%2F79%20must--pass-2EA44F?style=flat-square\" alt=\"dialog battery\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/production%20p50-13.6%20ms-2EA44F?style=flat-square\" alt=\"production p50\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/production%20p95-19.6%20ms-2EA44F?style=flat-square\" alt=\"production p95\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/peak%20RSS-314%20MB-2EA44F?style=flat-square\" alt=\"peak RSS\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/0%20GPU%20%2F%200%20network-2EA44F?style=flat-square\" alt=\"0 GPU / 0 network\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/v6.3%20voice%20surface-Whisper%20%2B%20Piper%20%2B%20tiny%20LM-9CCC65?style=flat-square\" alt=\"v6.3 voice\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/world%20core-3444%20curated%20/%204116%20facts-9CCC65?style=flat-square\" alt=\"world core\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/lexicon-25.5%20k%20roots-FBC02D?style=flat-square\" alt=\"lexicon\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/intents-41%20router%20%2F%2052%20neural%20classifier-2EA44F?style=flat-square\" alt=\"intents\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/hallucinations%20within%20curated%20domains-0-2EA44F?style=flat-square\" alt=\"hallucinations within curated domains\"\u003e\n\u003c/p\u003e\n\n---\n\n## What's new in v6.5.0\n\n**Blind eval at 97 % + Kazakh-only templates in v6.2 cascade.**\nrc14 (2026-06-10) shipped the blind-eval scoreboard the external\naudit recommended; six iterations later (rc15 safety guard, rc16\nenvironment alignment, rc17 factual + refusal patterns, rc18 OOD\ndiscipline, rc19 doc cleanup, rc20 cognitive_eval Kazakh-only\ntemplates) the curated Kazakh battery sits at **97 / 100**.\nThe Rust-level `ADAM_V6_2` default stays OFF pending a v6.6+\nv6.2-sibling test suite arc; production binaries opt in via the\nexisting env-var.  Voice REPL is the immediate beneficiary —\nPiper Kazakh TTS now reads adam's «what can I help with» template\ncleanly aloud (no «curated» / «Rust» / «LLM» / «live-feed» /\n«ASCII» tokens).\n\n## What's new in v6.2.0\n\n**Neurosymbolic agglutinative algebra.** The architectural redesign\npromised at v6.1.50 lands as the new\n[`adam-algebra`](crates/adam-algebra) crate plus an integration\nbridge in `adam-dialog::v6_2_router`.  Ships opt-in behind\n`ADAM_V6_2=1`; the production binaries (voice REPL, `adam_chat`,\n`adam_blind_eval`) set it at startup.  Library default flip is a\nv6.6+ arc — see the two-tracks note at the top of this file.\n\nThe full v6.2 pipeline runs as **pure typed-data manipulation**:\n\n```text\ninput → morph lattice → Composition[] → Frame → QueryIR\n                                          ↓\n                              FrameIndex + math_solver + system_clock\n                                          ↓\n                                       realiser → output\n```\n\n### Honest numbers (release build, M2 Air)\n\nadam runs in three latency classes.  All numbers are real and\nreproducible from this repo; **don't conflate the classes** when\ncomparing to other systems.\n\n| Class | What it measures | Median | p95 | Memory |\n|---|---|---|---|---|\n| **A. Typed kernel micro-path** | `adam-algebra` Stage 3: typed Frame → Index → answer.  Excludes STT, dialog router, NLG, anaphora resolution. | **~470 ns** | ~600 ns | \u003c50 MB RSS |\n| **B. Production dialog cascade** | `adam-dialog` 30-query real battery: morph → semantics → router → retrieval → reasoning → realiser.  Excludes STT/TTS. | **13.6 ms** | 19.6 ms | **314 MB RSS** |\n| **C. Full voice loop** (v6.3 voice REPL) | Mic capture → Whisper STT → cascade → Piper TTS.  Hot path for the audio interactive flow. | not measured (interactive) | n/a | + 2.3 GB STT + 1.1 GB TTS on disk |\n\nClass A is the typed-algebra micro-benchmark — useful for comparing\nkernel versions to each other, **not** for comparing adam to a full\nNLP system.  Class B is the honest production number — what a real\ntext query through `adam_chat` looks like.  Class C is the voice\ndemonstrator and depends on Whisper / Piper sizes.\n\n#### How this compares to LLMs\n\n**Different system class — not a head-to-head replacement.**  Llama\n3.1 / Claude / GPT cover 128K–1M token context across hundreds of\nlanguages and broad open-domain dialogue.  adam covers a narrow\ndeterministic Kazakh-first reasoning surface on curated facts.\nadam is faster, smaller, and 0-GPU **only because the problem\nscope is narrower**.\n\nWhat adam offers that LLMs cannot:\n- **Byte-deterministic answers** — same input → same output, no\n  sampling, no temperature, regression-testable in CI.\n- **0 GPU, 0 network** — runs offline on M2 Air, suitable for\n  airgapped / regulated / restricted environments.\n- **Auditable provenance** — every fact ships from\n  `data/world_core/` JSONL files; no hidden training corpus.\n- **Native Kazakh morphology** — FST-grounded analyses, not\n  byte-pair guessed.\n\nWhat adam does NOT offer (and is not trying to):\n- Broad open-domain general knowledge across many languages.\n- Long-context reasoning at LLM scale.\n- Creative / open-ended generation.\n\nFor a fair comparison, a same-task blind eval (300–1000 Kazakh\nqueries across factual / OOD / safety / tutor / multi-turn) is\nneeded.  See [v6.5 audit-to-training loop](docs/training_runbook.md)\nfor the current methodology.\n\n**Published-benchmark anchor (rc21 addition).**\nMost-cited Kazakh academic benchmark is **KazMMLU** ([arXiv:2502.12829](https://arxiv.org/abs/2502.12829),\n23 k Kazakh + Russian MCQ across STEM / humanities / social):\n\n| Model | KazMMLU avg | Resource cost |\n|---|---|---|\n| **adam** | not yet run on KazMMLU (97 / 100 on own 100-item battery) | **314 MB RSS, 0 GPU, $0 / query** |\n| GPT-4o | 76.6 % | API only; $2.50 / $10.00 per 1 M tok |\n| Llama 3.1 70B | 56.2 % | ~140 GB FP16, 2-4× H100 |\n| Llama 3.1 8B | 39.7 % | ~16 GB FP16, 1× A100 |\n| Claude (3.5 / 3.7 / 4) | **no Kazakh-specific score published** | API only; $3.00 / $15.00 per 1 M tok |\n\nSee [`docs/COMPARISON.md`](docs/COMPARISON.md) for the full table\n(hallucination rate, FLORES coverage, Sherkala-tuned Llama, etc.)\nplus honest gaps — chiefly that **adam has not yet been run on\nKazMMLU itself**; that's the rc22+ apples-to-apples step.\n\n### What ships in `adam-algebra` (8 modules, 195 tests)\n\n- **Stage 1 — `operator` + `root` + `composition`** — typed\n  agglutinative algebra. `Root × SuffixOp[]` with slot bookkeeping\n  (case / number / possessive / tense / voice / negation / polite).\n  Round-trip with `adam_kernel_fst::Analysis` byte-stable.\n- **Stage 2 — `frame`** — `Frame { agent, predicate, object,\n  modifiers, modality, polarity, evidentiality, tense, aspect }`.\n  Subsumes v6.1's scattered `Fact` / `SentenceFrame` /\n  `SentenceDecomposition` / `Claim`. 22 typed predicates.\n- **Stage 3 — `query`** — typed «frame with a hole»:\n  `QueryIR { …, focus, form, answer_shape, sense_hints,\n  domain_filter, language_filter }`. Subsumes v6.1's\n  `PredicateFocus` + `QuestionShape` + `AnswerShape` + `Intent`\n  + slot_inventory.\n- **Stage 4 — `index`** — `FrameIndex` with 5 secondary indexes\n  (predicate / agent / object / modifier / domain / language).\n  Insert O(M), query O(min-index-size). Property-tested against\n  brute-force linear filter.\n- **Stage 4.5 — `dialog_battery`** — 79 real Kazakh / Russian\n  REPL questions across 14 domains (biographical / geographical /\n  institutional / legal / sense-ambiguous / МО РК / programming /\n  Rust / math / system-clock / Soviet+Kazakh historical dates).\n  **CI quality gate: 79/79 must-pass, 0 known gaps, 0 regressions.**\n- **Stage 4.6+ — `math_solver`** — pure-function deterministic\n  math in Russian + Kazakh + ASCII. Numbers 0-99 in both\n  languages, operators (+ − × ÷ × ÷ mod ^ %), trig (sin / cos /\n  tan / arcsin / arccos / arctan), constants π / e, square root,\n  decimals. Left-to-right chained-imperative semantics.\n- **Stage 5 — sense disambiguation** — cross-shape\n  `TimeAnchor::Year(N)` ↔ phrase «N жыл» matching; bilingual\n  language-tag filter for same-root same-domain ambiguity\n  («гравитация» / «фотосинтез» / «днк» KZ vs RU).\n- **Stage 7.1 — `corpus_loader`** — `load_world_core(path)`\n  reads `data/world_core/*.jsonl`. **65 files → 3 444 entries →\n  4 116 facts → 0 dropped** (v6.3.0-rc2 live counts).\n- **Stage 7.2 — `realiser`** — typed `Frame → Kazakh surface`.\n  Pure function. Every v6.1 NLG rule family expressible.\n- **Stage 7.3 — `system_clock`** — OS wall-clock provider with\n  hand-rolled Kazakh calendar (no chrono dep).\n\n### What ships in `adam-dialog::v6_2_router`\n\n- `is_v6_2_active()` — reads `ADAM_V6_2` env var (default OFF; production binaries set `ADAM_V6_2=1` at startup; v6.5.0-rc19 deferred the library-wide default flip until v6.1→v6.2 test regressions close in rc20+).\n- `answer(input) -\u003e Option\u003cString\u003e` — top-level entry. Routes:\n  math_solver / system_clock / FrameIndex + realiser.\n- Lazy `OnceLock` corpus from `data/world_core/*.jsonl` +\n  battery-augmented Russian aliases.\n- `tests/v6_2_integration.rs`: 11 real Kazakh / Russian questions\n  end-to-end. All pass.\n\n### Live REPL demo\n\n```bash\n# Interactive (math, time, retrieval — all routed):\ncargo run --release --example chat -p adam-algebra\n\n# Latency bench + dialog battery report:\ncargo run --release --example bench_pipeline -p adam-algebra\n\n# Quality gate:\ncargo test -p adam-algebra dialog_battery_meets_quality_gate -- --nocapture\n\n# Integration through adam-dialog router (env-gated):\nADAM_V6_2=1 cargo test -p adam-dialog --test v6_2_integration\n```\n\n### Default-ON discipline (v6.5.0-rc19+)\n\n`ADAM_V6_2` is **default ON** since v6.5.0-rc19.  The flip happened\nafter the blind eval scoreboard (`adam_blind_eval`, rc14) crossed\nthe ≥90 % bar — final pre-flip number was **97 / 100** on rc18.\nThe v6.1 cascade is preserved as an escape hatch: set\n`ADAM_V6_2=0` (or `false` / `off` / `no`) to fall back.\n\nSee [`CHANGELOG.md` § 6.2.0](CHANGELOG.md) for the full inventory\nand [`docs/v6_2_architectural_redesign.md`](docs/v6_2_architectural_redesign.md)\nfor the design doc that preceded the work.\n\n---\n\n## 30-second pitch\n\n\u003e **adam is a neurosymbolic AI research kernel** built on the\n\u003e agglutinative morphology of Kazakh. Every output traces to a\n\u003e curated source via a typed pipeline (Composition → Frame →\n\u003e QueryIR → FrameIndex → realiser). Three structural advantages\n\u003e over LLMs by **construction**:\n\u003e\n\u003e 1. **Predictability** — every claim cites a curated `(pack,\n\u003e    sample_id)` provenance; the cascade is byte-identical given\n\u003e    `(input, seed, world_core)`.\n\u003e 2. **Cheapness** — single Rust binary, **314 MB peak RSS** on the\n\u003e    full 30-query production cascade, **0 % GPU**, **0 network**.\n\u003e    Runs offline on an M2 Air at **13.6 ms p50 / 19.6 ms p95**\n\u003e    per query (production cascade — see *Honest numbers* section\n\u003e    above for the three latency classes).  Voice loop adds STT\n\u003e    (~2.3 GB) + TTS (~1.1 GB) on disk.\n\u003e 3. **Architectural hallucination-freedom within curated domains** —\n\u003e    every fact-bearing reply emits from a `FrameIndex` hit + Stage 7\n\u003e    realiser, or falls through to v6.1 cascade / honest refusal. No\n\u003e    probabilistic free generation anywhere in the answer path.\n\u003e    *Scope caveat:* «0 hallucinations» holds for queries the\n\u003e    curated corpus covers; off-corpus queries route to v6.1\n\u003e    fallback (which can still misroute on edge cases — see the\n\u003e    open Stage 8 backlog) or to «нет данных».\n\u003e\n\u003e Designed to extend across ~30 catalogued agglutinative\n\u003e languages — currently demonstrated on Kazakh; cross-language\n\u003e ports are a research goal, not a shipped capability. Pure Rust.\n\u003e BUSL-1.1.\n\n\u003e **Reading order:** [MISSION.md](MISSION.md) (thesis) →\n\u003e [v6.2 design doc](docs/v6_2_architectural_redesign.md) (current\n\u003e architecture) → [RESEARCH.md](RESEARCH.md) (open questions) →\n\u003e [COLLABORATION.md](COLLABORATION.md) (partner terms) →\n\u003e [AGENTS.md](AGENTS.md) (orientation for automated scouts) →\n\u003e [CHANGELOG.md](CHANGELOG.md) (full release history).\n\n## Why neurosymbolic (not LLM)\n\nModern LLMs carry three structural problems we treat as **not\ninevitable**. v6.2's *neurosymbolic* architecture means: neural\ncomponents produce **typed closed-set** outputs only (sense\ndisambiguation candidates, focus detection, intent class); a\ndeterministic verifier owns truth. No free text generation\nanywhere in the answer path.\n\n| The three diseases of probabilistic AI | adam's target | How v6.2 enforces it |\n|---|---|---|\n| **Black box** — opaque internals, no source attribution | **Predictability** — every claim traceable | `FrameIndex.query` returns a `RankedFrame` with `FrameId`; `world_core/*.jsonl` is the only fact source; every realised sentence is a function of `(Frame, focus, slot)` |\n| **Resource cost** — billions of params, GPU clusters | **Cheapness** — single binary | 314 MB RSS, 0 % GPU, 0 network, 13.6 ms p50 production cascade on M2 Air (full 30-query battery) |\n| **Hallucination risk** — confident generation of plausible-sounding wrong content | **Safety** — architectural impossibility | Realiser is a typed pure function over a curated Frame; if the index returns `None`, the router falls through to v6.1 cascade (no invention) |\n\n**Hypothesis:** agglutinative languages — Kazakh in particular —\nexhibit unusually mathematical morphology. Every word decomposes\ninto a root + a typed suffix chain (case, number, tense, person,\npossessive, polarity, modality). Composition is **rule-bound**.\nThat structure becomes the substrate for a typed runtime: FST\nmorphology produces `Composition`, the algebra layer lifts it\ninto `Frame`, retrieval typed via `QueryIR`, output realised as\npure-function `Frame → surface`. **No probabilistic free\ngeneration. No retrained-from-scratch behaviour per release.**\n\n## Quick start\n\n```bash\n# v6.2 live REPL demo (math + clock + retrieval routed):\ncargo run --release --example chat -p adam-algebra\n\n# v6.2 latency bench + dialog battery quality report:\ncargo run --release --example bench_pipeline -p adam-algebra\n\n# v6.2 integration test through adam-dialog router:\nADAM_V6_2=1 cargo test -p adam-dialog --test v6_2_integration\n\n# v6.1 cascade (default, unchanged):\ncargo build --release -p adam-dialog --bin adam_chat\n./target/release/adam_chat\n./target/release/adam_chat --once \"Қасқыр — тірі ме?\"\n./target/release/adam_chat --trace\n./target/release/adam_chat --tts\n\n# FST synthesiser + analyser:\ncargo run --release -p adam-kernel-fst --bin adam_fst -- synth --root бала --plural --case dat\n# → балаларға\n\n# Full foundation validation (~30 s on M2):\nbash ./scripts/validate_foundation.sh\n```\n\n## Architecture — ARK (Agglutinative Reasoning Kernel)\n\nThree pillars:\n\n- **A**gglutinative — Kazakh morphology decomposes deterministically (root + typed suffixes); composition is rule-bound, not learned.\n- **R**easoning — a curated knowledge graph ([`data/world_core/*.jsonl`](data/world_core)) + a forward-chaining reasoner (10 active rules) produces every fact-bearing claim. Every output cites a source.\n- **K**ernel — system-runtime, not a probabilistic estimator. ARK has small trained components (selection-weights perceptron, suffix-chain priors, root-affinity PMI) but they sit inside the kernel as inspectable layers, not at the centre.\n\n### v6.2 typed pipeline (`adam-algebra`)\n\n```text\ninput ─▶ FST lattice ─▶ Composition ─▶ Frame ─▶ QueryIR ─▶ FrameIndex ─▶ Realiser ─▶ output\n        (kernel-fst)    (operator +    (semantic  (frame    (typed       (Frame →\n                         root +         record)    with a    retrieval)   Kazakh)\n                         composition)              hole)\n```\n\n### v6.1 cascade (default, unchanged)\n\n```text\ninput ─▶ parser ─▶ semantics ─▶ [ retrieval + compose ] ─▶ planner ─▶ realiser ─▶ FST synth ─▶ output\n        (Layer 1) (Layer 2)       (Layer 2.5–2.75)       (Layer 3)   (Layer 4)   (Layer 5)\n```\n\nNo transformer. No embeddings. No probabilistic generation. For any input, a developer can dump every layer's state and audit why the model chose what it said.\n\n### Crates\n\n| Layer | Crate | Role |\n|---|---|---|\n| **L0** | [`adam-kernel`](crates/adam-kernel) | Core identity + foundation contracts |\n| **L0** | [`adam-kernel-fst`](crates/adam-kernel-fst) | FST morphology — phonology, morphotactics, synthesiser + parser, 25.5 k-entry Lexicon |\n| **L0.5** | [`adam-algebra`](crates/adam-algebra) | **v6.2** — typed neurosymbolic stack: agglutinative algebra (Composition / Frame / QueryIR), FrameIndex retrieval, realiser, math_solver, system_clock, corpus_loader. Source of the 79-case real-Kazakh dialog battery + CI quality gate. |\n| **L1** | [`adam-tokenizer`](crates/adam-tokenizer) | Pre-tokenizer + BPE trainer + encoder |\n| **L1** | [`adam-corpus`](crates/adam-corpus) | Source acceptance, streaming processors, synthetic generator, `corpus_audit`, `morpheme_coverage` |\n| **L1** | [`adam-eval`](crates/adam-eval) | Evaluation suite + benchmark reports |\n| **L1** | [`adam-dialog`](crates/adam-dialog) | Dialog pipeline — 41 intents, multi-turn session + DST, template planner, slot-expanding realiser, voice output transducer. **v6.2:** `v6_2_router` integration bridge (env-gated). |\n| **L1** | [`adam-retrieval`](crates/adam-retrieval) | Retrieval engine — morpheme inverted index, deterministic ranking, in-sample composition |\n| **L1** | [`adam-reasoning`](crates/adam-reasoning) | Reasoning engine — typed-fact graph, 10 active forward-chaining rules, `extract_facts` / `build_lexical_graph` / `run_reasoner` binaries |\n| **L1** | [`adam-scaling`](crates/adam-scaling) | Tier-by-tier scaling bench across the corpus |\n| **L1** | [`adam-train`](crates/adam-train) | Legacy transformer baseline, preserved as regression reference |\n\nEvery layer outputs deterministic, regression-tested JSON artifacts. `bash ./scripts/validate_foundation.sh` runs the full foundation validation end-to-end. See [`docs/architecture_v3.md`](docs/architecture_v3.md) for the canonical v6.1 architecture reference and [`docs/v6_2_architectural_redesign.md`](docs/v6_2_architectural_redesign.md) for the v6.2 design.\n\n## Demo — v6.2 live REPL\n\n```\n$ cargo run --release --example chat -p adam-algebra\n=== adam ARK — live REPL ===\n(deterministic; CPU-only; 0 MB model; no LLM)\nКурс: 65 curated facts. Print a question on each line; type «exit» to quit.\n\n? Ахмет Байтұрсынұлы қашан туылған?\n\u003e 1872     (25 417 ns)\n\n? Двадцать пять умножь на 7 раздели на два прибавь три\n\u003e 90.5     (15 667 ns)\n\n? Корень из шестнадцати\n\u003e 4        (5 334 ns)\n\n? Что такое гравитация?\n\u003e сила притяжения масс       (14 458 ns)        # Russian — language-filtered\n\n? Гравитация деген не?\n\u003e массалардың бір-бірін тарту күші   (12 208 ns) # Kazakh — language-filtered\n\n? Бүгін айдың нешесі?\n\u003e 25       (live clock)\n\n? Қазір қандай ай?\n\u003e мамыр    (live clock)\n```\n\n**Every answer is typed-data manipulation** — no template, no\nfree generation, no LLM call, no GPU. Bilingual disambiguation\nworks on same-root words («гравитация» appears in both KZ and RU\ncorpus with different definitions; `language_filter` resolves\nwhich surfaces).\n\nFor a full evidence dump on any Kazakh root, run [`adam_inspect`](crates/adam-dialog/src/bin/adam_inspect.rs).\n\n## What's measurable\n\n| Metric | Value | Notes |\n|---|---|---|\n| Workspace tests | **2 339 passing / 0 failed / 27 ignored** | v6.3.0-rc2 on M2 (`cargo test --release --workspace --locked`); v6.2.0 had 1 904 passing — v6.3 added voice REPL + neural intent classifier + replay battery + Phase 21 calendar tests |\n| Release cadence | **540+ versioned releases since 2026-04-07** | every release CI-verified |\n| v6.2 dialog battery | **79/79 must-pass, 0 gaps** | 14 real-Kazakh domains; CI quality gate via `dialog_battery_meets_quality_gate` |\n| Production cascade latency (M2 Air, 30-query battery) | **13.6 ms p50** / 19.6 ms p95 / 314 MB peak RSS | full `adam-dialog` cascade — morph → router → retrieval → reasoning → realiser; excludes STT/TTS |\n| Stage 3 typed-kernel micro-path | **~470 ns avg** | `adam-algebra` Stage 3 only — algebraic core; NOT a fair head-to-head against full NLP systems |\n| v6.2 throughput (single core) | **~1.26 M queries / sec** | scales linearly across cores |\n| v6.2 model size | **0 MB** | pure typed-data manipulation |\n| v6.1 cascade p50 turn latency | **~21 ms** | vs Llama-3 8B fp16 800–1500 ms; vs GPT-4 50–200 ms |\n| Memory footprint | **~300 MB RSS** | both cascades; vs LLM 16+ GB VRAM |\n| GPU usage | **0 %** | vs LLM dedicated GPU |\n| Hallucination rate (curated-domain coverage) | **0 %** architectural | verified by graph admissibility tests + Stage 4 quality gate. Off-corpus queries fall through to v6.1 cascade or honest «нет данных» — they cannot invent facts, but they can misroute. Stage 8 (HumanDialogEval) measures the off-corpus surface |\n| Lexicon roots | **25.5 k** | 13.6 k pure Kazakh + 11.9 k Apertium imports |\n| Curated entries (`world_core/`) | **3 461 entries / 4 044 frames** | across 66 domains; `validate_world_core` enforced in CI |\n| Curated facts total (incl. v6.1 augmentation) | **4 116** | `world_core` + bilingual aliases + МО РК + historical |\n| Derived facts | **37 991** | from 10 forward-chaining rules over the curated graph |\n| Dialog intents (v6.1 cascade) | **41** | template planner with `{slot\\|features}` FST-aware syntax |\n\nSee [`docs/performance.md`](docs/performance.md) for the full performance report and [`docs/scaling_report.md`](docs/scaling_report.md) for the per-tier scaling bench.\n\n## FAQ\n\n**Is this a wrapper around an LLM?** No. There is no LLM, no neural network at the answer path, no API call to OpenAI / Anthropic / Google. v6.1 inference is FST + forward-chaining reasoner over a typed-fact graph; v6.2 inference is the typed Composition → Frame → QueryIR → FrameIndex → realiser pipeline.\n\n**Is it really deterministic?** Yes. In v6.1, the only source of randomness is `planner::choose_template` (pin the seed → byte-identical output). In v6.2, the answer path is entirely pure-function: `FrameIndex.query(QueryIR)` returns frames in `(score desc, frame_id asc)` order, `realiser::realise(frame, focus, slot)` is a pure mapping; same input → byte-identical surface.\n\n**What is \"neurosymbolic\" supposed to mean here?** Per the [v6.2 design doc](docs/v6_2_architectural_redesign.md): neural components — when they exist (Stage 6, not in v6.2.0) — produce **typed closed-set** outputs only (intent class, sense disambiguation candidates, retrieval ranker score). They never generate free text. The verifier owns truth and rejects any candidate without a `FrameIndex` provenance. v6.2.0 ships the **symbolic half** of the architecture (typed algebra + retrieval + realiser); Stage 6 will add learned closed-set components inside this scaffolding.\n\n**Why Kazakh?** Kazakh's agglutinative morphology is exceptionally regular: every word decomposes into root + typed suffixes (case, number, tense, person, possessive, polarity, modality), each contributing a known operator. Composition is rule-bound, not learned. This is the cleanest substrate we know of for a deterministic AI runtime.\n\n**Will it generalise to other languages?** The architecture is *designed* for it but not yet *demonstrated* on a second language. ~30 candidate agglutinative languages are catalogued in [MISSION.md](MISSION.md#agglutinative-languages--global-research-frontier); first port (Karakalpak or Kyrgyz) is on the v6 research roadmap with measured porting cost as a deliverable. Treat the multi-language story as a research goal, not a current product capability.\n\n**What is the funding model?** Two parallel tracks: angel pre-seed private capital ($200K–300K target) and state research grants from agglutinative-language country research agencies (Japan JST/JSPS, South Korea NRF, Finland Academy of Finland, Turkey TÜBİTAK, Hungary NKFIH, Estonia ETAg, Uzbekistan, Kyrgyzstan, Mongolia, Tatarstan). See [COLLABORATION.md](COLLABORATION.md).\n\n**Who built this?** [Daulet Baimurza](https://github.com/DauletBai), founder of Qazna Technologies. Solo development since 2026-04-07. Repository public since 2026-05-08. License: BUSL-1.1 (source-available; commercial use by permission).\n\n**How do I cite this work?** See [CITATION.cff](CITATION.cff) and [codemeta.json](codemeta.json). GitHub renders the citation file as a \"Cite this repository\" button on the right sidebar.\n\n## Recent releases\n\n**v6.2.0 — Neurosymbolic agglutinative algebra (env-gated).** The\narchitectural redesign promised at v6.1.50. Lands the new\n`adam-algebra` crate (8 modules, 195 tests) + `adam-dialog::v6_2_router`\nintegration bridge. v6.1 cascade unchanged when the gate is off.\nWorkspace tests 1735 → 1904. Real-Kazakh dialog battery 79/79\nmust-pass, 0 known gaps. Stage 3 typed-kernel micro-path runs\n~470 ns avg; production cascade p50 ~13.6 ms (see *Honest numbers*\nabove for the three latency classes). See [CHANGELOG.md § 6.2.0](CHANGELOG.md).\n\n**v6.1.50 — v6.1-series freeze.** Time-unit Count + Disagreement\nanswer-shape + voice aliases + doc refresh. Final v6.1.x release;\npatch strategy reached its ceiling after 11 releases across two days.\n\n**v6.1.0 — AnswerIR + predicate-aware retrieval (opt-in behind\n`ADAM_ANSWER_IR=1`).** Typed `PredicateFocus` enum + 11 new typed\npredicates (`BornIn` / `DiedIn` / `FoundedIn` / `EffectiveFrom` /\n`Classifies` / `LocatedIn` / `Authored` / etc.) — closes the\n2026-05-22 Codex audit relevance/completeness gap.\n\n**v6.0.0 — Technical GA.** L5.5 TinyAgt neural composer preview\n(opt-in), E1 discriminative intent classifier (95.95 % accuracy\nat 600× lower latency than full cascade), E2 slot extractor,\nWhisper.cpp voice input, KRU / Baitursynuly / AI-Law domain,\nseven rounds of user-driven audit fixes.\n\n**v5.0.0 – v5.4.0 — Voice, multimodal, World-Graph bridge data.**\nOS-native TTS (macOS `Aru`, Linux `espeak-ng`), optional Piper\nbackend, Kazakh G2P. v5.4.0: 25+ dead-end abstract hubs closed,\nderived facts +4 577, bare yes/no IsA route.\n\nFor full release history (540+ releases since 2026-04-07), see\n[CHANGELOG.md](CHANGELOG.md). For the phase-by-phase roadmap,\nsee [`docs/roadmap.md`](docs/roadmap.md).\n\n## Open to collaboration\n\nWe are open to collaboration in every direction:\n\n- **Linguists** — agglutinative morphology, formal phonology, computational semantics\n- **AI researchers** — deterministic / neurosymbolic alternatives to neural inference, formal verification of language models\n- **Educational institutions** — pilot deployments with Kazakh-language students (current focus: Almaty / Astana schools)\n- **National research agencies** — joint research grants from agglutinative-language country agencies\n- **Government / defence** — offline-capable, auditable language AI for Kazakh and related languages; aligned with Kazakhstan's [AI Law of 18 January 2026](data/world_core/kru_baitursynov.jsonl) for defense-relevant high-risk categories\n- **Investors** — angel pre-seed / seed stage who share the thesis that probabilistic AI is not the only path forward\n\nContact: **baimurza.daulet@gmail.com** · [LinkedIn](https://www.linkedin.com/in/daulet-baimurza-4b3506211)\n\nSee [COLLABORATION.md](COLLABORATION.md) for full per-class engagement terms.\n\n## Repository layout\n\n```\ncrates/                Rust workspace (11 crates, L0–L1)\n  adam-algebra/        v6.2 typed neurosymbolic stack (algebra / Frame / QueryIR / FrameIndex / realiser / math_solver / system_clock / corpus_loader)\n  adam-kernel-fst/     FST morphology — phonology, morphotactics, synthesiser + parser\n  adam-dialog/         Dialog pipeline (v6.1 cascade) + v6_2_router (integration bridge)\n  …                    (see Crates table above)\ndata/world_core/       Curated typed-fact graph (jsonl, by domain)\ndata/dialog/           Template repository + curriculum\ndata/retrieval/        Morpheme index + extracted facts + derived facts\ndata/eval/             Live holdouts + cognitive eval datasets\ndata/lexicon_v1/       Apertium-imported roots\ndata/tokenizer/        Curated segmentation roots\ndocs/                  Architecture, roadmap, performance, foundation policies\n  v6_2_architectural_redesign.md  v6.2 design doc (signed off 2026-05-24)\nscripts/               validate_foundation.sh + release tooling\n```\n\nSee [`data/README.md`](data/README.md) for a top-level map of `data/`, and per-subdirectory READMEs for details.\n\n## Foundation policies\n\n[corpus](docs/corpus_policy.md) · [sources](docs/corpus_sources.md) · [curation](docs/curation_workflow.md) · [classification](docs/source_classification.md) · [scoring](docs/source_scoring.md) · [tokenizer](docs/tokenizer_policy.md) · [evaluation](docs/evaluation_policy.md) · [dialog architecture](docs/kazakh_grammar/07_dialog_architecture.md) · [Kazakh grammar reference](docs/kazakh_grammar/README.md)\n\n## Out of scope\n\n- **Probabilistic / LLM-style free generation** — every response is a curated fact retrieved via `FrameIndex` (v6.2) or a template realisation / verbatim corpus quote / rule derivation (v6.1). Nothing invented.\n- **Multilingual input** — Russian queries work for terms whose surface differs from Kazakh (`«вода» / «скорость света» / «столица казахстана»`); same-root same-domain bilingual ambiguity is resolved via `language_filter`. Other languages are research-direction, not currently shipped.\n- **Trained neural LM components in the answer path** — the dialog core stays rule-based: selection weights, suffix priors, PMI, E1 intent classifier; no transformer in the answer path, no free generation. **v6.3 ships closed-set neural components at the speech surface only** (Whisper.cpp STT + Piper TTS + ~1 M-param contextual LM rescorer + ~1 M-param BPE intent classifier in `tools/voice_repl_v6_3`) — they normalise audio ↔ text around the deterministic core; they never invent facts. The Stage 6 in-kernel neural layer of the v6.0 design doc remains unshipped.\n- **Cloud platform work** — adam runs as a single offline binary.\n\n### Graph-First Policy\n\nThe graph layer of `adam` is **Rust-native and repository-native**. No external graph database as a required runtime; no Cypher / Gremlin / SPARQL query layer in the core pipeline; no Python graph stack hidden behind scripts. The canonical graph representation, traversal, and artifact builders live in Rust crates inside this repository. Shell scripts may orchestrate graph builds only as thin wrappers around `cargo run`.\n\n## License\n\n[Business Source License 1.1](LICENSE). Converts automatically to Apache License 2.0 on **2029-01-01**.\n\nNon-commercial and research use is unrestricted today. Commercial use is permitted unless it competes directly with Qazna Technologies LLP products or services.\n\nFor commercial licensing inquiries: **baimurza.daulet@gmail.com**\n\nCopyright © 2026 Qazna Technologies LLP.\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"MISSION.md\"\u003eMISSION\u003c/a\u003e ·\n  \u003ca href=\"docs/v6_2_architectural_redesign.md\"\u003ev6.2 ARCH\u003c/a\u003e ·\n  \u003ca href=\"RESEARCH.md\"\u003eRESEARCH\u003c/a\u003e ·\n  \u003ca href=\"COLLABORATION.md\"\u003eCOLLABORATION\u003c/a\u003e ·\n  \u003ca href=\"AGENTS.md\"\u003eAGENTS\u003c/a\u003e ·\n  \u003ca href=\"CHANGELOG.md\"\u003eCHANGELOG\u003c/a\u003e ·\n  \u003ca href=\"docs/roadmap.md\"\u003eroadmap\u003c/a\u003e ·\n  \u003ca href=\"CITATION.cff\"\u003ecite\u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqazaq-ai%2Fadam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqazaq-ai%2Fadam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqazaq-ai%2Fadam/lists"}