{"id":46647673,"url":"https://github.com/cognica-io/uqa","last_synced_at":"2026-04-08T07:01:23.530Z","repository":{"id":342588276,"uuid":"1174452506","full_name":"cognica-io/uqa","owner":"cognica-io","description":"UQA — Unified Query Algebra: a multi-paradigm database engine unifying relational, text retrieval, vector search, graph query, and geospatial paradigms under a single algebraic structure","archived":false,"fork":false,"pushed_at":"2026-04-01T08:30:29.000Z","size":73628,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-01T10:25:19.282Z","etag":null,"topics":["bm25","database","full-text-search","geospatial","graph-database","information-retrieval","multi-paradigm","opencypher","posting-list","python","query-engine","sql","vector-search"],"latest_commit_sha":null,"homepage":"https://cognica-io.github.io/uqa/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cognica-io.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-06T13:08:41.000Z","updated_at":"2026-04-01T08:24:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cognica-io/uqa","commit_stats":null,"previous_names":["cognica-io/uqa"],"tags_count":36,"template":false,"template_full_name":null,"purl":"pkg:github/cognica-io/uqa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognica-io%2Fuqa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognica-io%2Fuqa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognica-io%2Fuqa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognica-io%2Fuqa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cognica-io","download_url":"https://codeload.github.com/cognica-io/uqa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognica-io%2Fuqa/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31434624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T08:13:15.228Z","status":"ssl_error","status_checked_at":"2026-04-05T08:13:11.839Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bm25","database","full-text-search","geospatial","graph-database","information-retrieval","multi-paradigm","opencypher","posting-list","python","query-engine","sql","vector-search"],"created_at":"2026-03-08T05:09:30.456Z","updated_at":"2026-04-05T12:01:12.134Z","avatar_url":"https://github.com/cognica-io.png","language":"Python","readme":"# UQA — Unified Query Algebra\n\nA multi-paradigm database engine that unifies **relational**, **text retrieval**, **vector search**, **graph query**, and **geospatial** paradigms under a single algebraic structure, using posting lists as the universal abstraction. SQL interface targets **PostgreSQL 17** compatibility.\n\n\u003e **Background:** The unified query algebra theory behind this project is already deployed in production as [Cognica Database](https://cognica.io), a commercial multi-paradigm database engine built in C++20/23. UQA is the standalone Python implementation of that theory, open-sourced under AGPL-3.0. It is under active development and serves both as a production-ready embeddable database and as a reference implementation for the underlying algebraic framework.\n\n## Background\n\nModern data systems are fragmented into specialized engines: relational databases built on relational algebra, search engines on probabilistic IR models, vector databases on geometric similarity, and graph databases on traversal semantics. UQA eliminates this fragmentation by proving that a single algebraic structure can express operations across all four paradigms.\n\n### Posting Lists as Universal Abstraction\n\nThe core insight is that **posting lists** — sorted sequences of `(document_id, payload)` pairs — can represent result sets from any paradigm. A posting list $L$ is defined as:\n\n$$\nL = [(id_1, payload_1),\\ (id_2, payload_2),\\ \\ldots,\\ (id_n, payload_n)]\n$$\n\nwhere $id_i \u003c id_j$ for all $i \u003c j$. A bijection $PL: 2^{\\mathcal{D}} \\rightarrow \\mathcal{L}$ maps document sets to posting lists and back, allowing set-theoretic reasoning to carry over directly.\n\n### Boolean Algebra\n\nThe structure $(\\mathcal{L},\\ \\cup,\\ \\cap,\\ \\overline{\\cdot},\\ \\emptyset,\\ \\mathcal{D})$ forms a **complete Boolean algebra** — satisfying commutativity, associativity, distributivity, identity, and complement laws. This means any combination of AND, OR, and NOT across paradigms is algebraically well-defined, and query optimization can exploit lattice-theoretic rewrite rules.\n\n### Cross-Paradigm Operators\n\nPrimitive operators map each paradigm into the posting list space:\n\n| Operator | Definition | Paradigm |\n|----------|-----------|----------|\n| $T(t)$ | $PL(\\lbrace d \\in \\mathcal{D} \\mid t \\in term(d, f) \\rbrace)$ | Text retrieval |\n| $V_\\theta(q)$ | $PL(\\lbrace d \\in \\mathcal{D} \\mid sim(vec(d, f),\\ q) \\geq \\theta \\rbrace)$ | Vector search |\n| $KNN_k(q)$ | $PL(D_k)$ where $\\|D_k\\| = k$, ranked by similarity | Vector search |\n| $Filter_{f,v}(L)$ | $L \\cap PL(\\lbrace d \\in \\mathcal{D} \\mid d.f = v \\rbrace)$ | Relational |\n| $Score_q(L)$ | $(L,\\ [s_1, \\ldots, s_{\\|L\\|}])$ | Scoring |\n\nBecause every operator produces a posting list, they compose freely. A hybrid text + vector search is simply an intersection:\n\n$$\nHybrid_{t,q,\\theta} = T(t) \\cap V_\\theta(q)\n$$\n\n### Graph Extension\n\nThe second paper extends the framework to graph data by establishing a **Graph-Posting List Isomorphism**. A graph posting list $L_G = [(id_1, G_1), \\ldots, (id_n, G_n)]$ maps to standard posting lists via:\n\n$$\n\\Phi(L_G) = PL\\left(\\bigcup_{i=1}^{n} \\phi_{G \\rightarrow D}(G_i)\\right)\n$$\n\nThis isomorphism preserves Boolean operations — $\\Phi(L_G^1 \\cup_G L_G^2) = \\Phi(L_G^1) \\cup \\Phi(L_G^2)$ — so graph traversals, pattern matches, and path queries integrate seamlessly with text, vector, and relational operations under the same algebra.\n\n### Vector Calibration\n\nThe fifth paper addresses a fundamental gap in hybrid search: vector similarity scores (cosine similarity, inner product, Euclidean distance) are geometric quantities, not probabilities. A cosine similarity of 0.85 does not mean an 85% chance of relevance, yet hybrid systems routinely combine such scores with calibrated lexical signals through ad-hoc normalization. The paper presents a Bayesian calibration framework that transforms vector scores into calibrated relevance probabilities through a likelihood ratio formulation:\n\n$$\n\\text{logit}\\ P(R=1 \\mid d) = \\log \\frac{f_R(d)}{f_G(d)} + \\text{logit}\\ P(R=1)\n$$\n\nwhere $f_R(d)$ is the local distance density among relevant documents and $f_G(d)$ is the global background density. This has the same additive structure as Bayesian BM25 calibration, establishing a structural identity between lexical and dense retrieval scoring. Both densities are extracted from statistics already computed during ANN index construction and search — IVF cell populations and intra-cluster distances, HNSW edge distances and search trajectories — at negligible additional cost. The resulting calibrated vector scores integrate with Bayesian BM25 through additive log-odds:\n\n$$\n\\text{logit}\\ P(R \\mid d_{vec}, s_{bm25}) = \\underbrace{\\log \\frac{\\hat{f}_R(d)}{f_G(d)}}_{\\text{calibrated vector}} + \\underbrace{\\alpha(s_{bm25} - \\beta)}_{\\text{calibrated lexical}} + \\underbrace{\\text{logit}\\ P_{base}}_{\\text{corpus prior}}\n$$\n\nThis completes the probabilistic unification of sparse and dense retrieval: both paradigms are calibrated through the same Bayesian likelihood ratio structure, each drawing on the statistics of its native index. For full treatment, see [Paper 5](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf).\n\n### Compositional Completeness\n\nThe framework guarantees that **any query expressible as a combination of relational, text, vector, and graph operations** has a representation in the unified algebra (Theorem 3.3.5). This is not merely an interface unification — the algebraic closure ensures that cross-paradigm queries (e.g., \"find papers cited by graph neighbors whose embeddings are similar to a query vector and whose titles match a keyword\") are first-class operations with well-defined optimization rules.\n\nFor full formal treatment, see [Paper 1](docs/papers/1.%20A%20Unified%20Mathematical%20Framework%20for%20Query%20Algebras%20Across%20Heterogeneous%20Data%20Paradigms.pdf), [Paper 2](docs/papers/2.%20Extending%20the%20Unified%20Mathematical%20Framework%20to%20Support%20Graph%20Data%20Structures.pdf), [Paper 3](docs/papers/3.%20Bayesian%20BM25%20-%20A%20Probabilistic%20Framework%20for%20Hybrid%20Text%20and%20Vector%20Search.pdf), and [Paper 5](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf).\n\n## Overview\n\nUQA extends standard SQL with cross-paradigm query functions:\n\n```sql\n-- GIN index: enable full-text search on specific columns (PostgreSQL-compatible)\nCREATE INDEX idx_articles_gin ON articles USING gin (title, body);\nCREATE INDEX idx_papers_gin ON papers USING gin (title, abstract)\n    WITH (analyzer='english_stem');\n\n-- Full-text search with @@ operator (query string mini-language)\nSELECT title, _score FROM articles\nWHERE title @@ 'database AND query' ORDER BY _score DESC;\n\n-- Search result highlighting (matched terms wrapped in tags)\nSELECT title, uqa_highlight(body, 'database query') AS snippet\nFROM articles WHERE body @@ 'database query'\nORDER BY _score DESC;\n\n-- Custom highlight tags and snippet extraction\nSELECT title, uqa_highlight(body, 'search', '\u003cem\u003e', '\u003c/em\u003e', 2, 100) AS snippet\nFROM articles WHERE body @@ 'search';\n\n-- Faceted search: value counts over search results\nSELECT uqa_facets(category) FROM articles WHERE body @@ 'database';\n-- Returns: facet_value | facet_count\n\n-- Multi-field facets in a single query\nSELECT uqa_facets(category, author) FROM articles WHERE body @@ 'database';\n-- Returns: facet_field | facet_value | facet_count\n\n-- Hybrid text + vector fusion via @@\nSELECT title, _score FROM articles\nWHERE _all @@ 'body:search AND embedding:[0.1, 0.9, 0.0, 0.0]'\nORDER BY _score DESC;\n\n-- Full-text search with BM25 scoring\nSELECT title, _score FROM papers\nWHERE text_match(title, 'attention transformer') ORDER BY _score DESC;\n\n-- Multi-signal fusion: text + vector + graph\nSELECT title, _score FROM papers\nWHERE fuse_log_odds(\n    text_match(title, 'attention'),\n    knn_match(embedding, ARRAY[0.1, 0.2, ...], 5),\n    traverse_match(1, 'cited_by', 2)\n) AND year \u003e= 2020\nORDER BY _score DESC;\n\n-- Multi-stage retrieval: broad recall -\u003e precise re-ranking\nSELECT title, _score FROM papers\nWHERE staged_retrieval(\n    bayesian_match(title, 'transformer attention'), 50,\n    bayesian_match(abstract, 'self attention mechanism'), 10\n) ORDER BY _score DESC;\n\n-- Multi-field search across title + abstract\nSELECT title, _score FROM papers\nWHERE multi_field_match(title, abstract, 'attention transformer')\nORDER BY _score DESC;\n\n-- Deep fusion: multi-layer neural network as SQL\nSELECT id, _score FROM patches\nWHERE deep_fusion(\n    layer(knn_match(embedding, $1, 16)),\n    convolve('spatial', ARRAY[0.6, 0.4]),\n    pool('spatial', 'max', 2),\n    flatten(),\n    dense(ARRAY[...], ARRAY[...], output_channels =\u003e 4, input_channels =\u003e 8),\n    softmax(),\n    gating =\u003e 'relu'\n) ORDER BY _score DESC;\n\n-- Deep learning: train a CNN classifier (no backpropagation)\nSELECT deep_learn(\n    'mnist_cnn', label, embedding, 'spatial',\n    convolve(n_channels =\u003e 32),\n    pool('max', 2),\n    attention(n_heads =\u003e 4, mode =\u003e 'learned_v'),\n    convolve(n_channels =\u003e 64),\n    pool('max', 2),\n    flatten(),\n    dense(output_channels =\u003e 10),\n    softmax(),\n    gating =\u003e 'relu', lambda =\u003e 1.0,\n    l1_ratio =\u003e 0.3, prune_ratio =\u003e 0.5\n) FROM mnist_train;\n\n-- Deep learning: inference with trained model\nSELECT id, deep_predict('mnist_cnn', embedding) AS pred FROM test_data;\n\n-- Deep learning: inference via deep_fusion pipeline\nSELECT id, _score, class_probs FROM grid_28x28\nWHERE deep_fusion(\n    model('mnist_cnn', $1),\n    gating =\u003e 'relu'\n) ORDER BY _score DESC;\n\n-- Temporal graph traversal (edges valid at timestamp)\nSELECT * FROM temporal_traverse(1, 'knows', 2, 1700000000);\n\n-- JOINs with qualified columns\nSELECT e.name, d.name AS dept, e.salary\nFROM employees e\nINNER JOIN departments d ON e.dept_id = d.id\nORDER BY e.salary DESC;\n\n-- Window functions with frames\nSELECT rep, sale_date, amount,\n       SUM(amount) OVER (ORDER BY sale_date\n           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total\nFROM sales;\n\n-- Recursive CTE\nWITH RECURSIVE org_tree AS (\n    SELECT id, name, 1 AS depth FROM org_chart WHERE manager_id IS NULL\n    UNION ALL\n    SELECT o.id, o.name, t.depth + 1\n    FROM org_chart o INNER JOIN org_tree t ON o.manager_id = t.id\n)\nSELECT name, depth FROM org_tree ORDER BY depth;\n\n-- Advanced aggregates with FILTER and CASE pivot\nSELECT region,\n       SUM(amount) FILTER (WHERE returned = FALSE) AS net_revenue,\n       COUNT(*) FILTER (WHERE returned = TRUE) AS return_count\nFROM sales GROUP BY region;\n\n-- Date/time functions\nSELECT DATE_TRUNC('month', sale_date) AS month,\n       COUNT(*) AS num_sales, SUM(amount) AS revenue\nFROM sales GROUP BY DATE_TRUNC('month', sale_date);\n\n-- Graph traversal and regular path queries\nSELECT _doc_id, title FROM traverse(1, 'cited_by', 2);\nSELECT _doc_id, title FROM rpq('cited_by/cited_by', 1);\n\n-- Apache AGE compatible graph query (openCypher)\nSELECT * FROM create_graph('social');\n\nSELECT * FROM cypher('social', $$\n    CREATE (a:Person {name: 'Alice', age: 30})-[:KNOWS]-\u003e(b:Person {name: 'Bob', age: 25})\n    RETURN a.name, b.name\n$$) AS (a_name agtype, b_name agtype);\n\nSELECT * FROM cypher('social', $$\n    MATCH (p:Person)-[:KNOWS]-\u003e(friend:Person)\n    WHERE p.age \u003e 25\n    RETURN p.name, friend.name, p.age\n    ORDER BY p.name\n$$) AS (name agtype, friend agtype, age agtype);\n\n-- Geospatial: R*Tree spatial index with Haversine distance\nCREATE TABLE restaurants (\n    id SERIAL PRIMARY KEY,\n    name TEXT NOT NULL,\n    cuisine TEXT NOT NULL,\n    location POINT\n);\n\nCREATE INDEX idx_loc ON restaurants USING rtree (location);\n\nSELECT name, ROUND(ST_Distance(location, POINT(-73.9857, 40.7484)), 0) AS dist_m\nFROM restaurants\nWHERE spatial_within(location, POINT(-73.9857, 40.7484), 5000)\nORDER BY dist_m;\n\n-- Spatial + text + vector fusion\nSELECT name, _score FROM restaurants\nWHERE fuse_log_odds(\n    text_match(description, 'pizza'),\n    spatial_within(location, POINT(-73.9969, 40.7306), 3000),\n    knn_match(embedding, $1, 5)\n) ORDER BY _score DESC;\n\n-- Text analysis: custom analyzer with stemming\nSELECT * FROM create_analyzer('english_stem', '{\n    \"tokenizer\": {\"type\": \"standard\"},\n    \"token_filters\": [{\"type\": \"lowercase\"}, {\"type\": \"stop\", \"language\": \"english\"},\n                      {\"type\": \"porter_stem\"}],\n    \"char_filters\": []}');\n\nSELECT * FROM list_analyzers();\n\n-- Foreign Data Wrapper with Hive partitioning\nCREATE SERVER warehouse FOREIGN DATA WRAPPER duckdb_fdw;\n\nCREATE FOREIGN TABLE sales (\n    id INTEGER, name TEXT, amount INTEGER,\n    year INTEGER, month INTEGER\n) SERVER warehouse OPTIONS (\n    source '/data/sales/**/*.parquet',\n    hive_partitioning 'true'\n);\n\n-- Predicate pushdown: DuckDB prunes partitions at source\nSELECT name, SUM(amount) FROM sales\nWHERE year IN (2024, 2025) AND month \u003e 6\nGROUP BY name ORDER BY SUM(amount) DESC;\n\n-- Full query pushdown: entire query delegated to DuckDB\n-- (JOINs, window functions, subqueries all execute in DuckDB)\nSELECT pickup_zone, COUNT(*) AS trips,\n       AVG(total_amount) AS avg_total\nFROM taxi_trips t\nJOIN taxi_zones z ON t.pu_location_id = z.location_id\nGROUP BY pickup_zone ORDER BY trips DESC LIMIT 10;\n```\n\n## Architecture\n\n```mermaid\ngraph TD\n    SQL[SQL Parser\u003cbr/\u003epglast] --\u003e Compiler[SQL Compiler]\n    QB[QueryBuilder\u003cbr/\u003eFluent API] --\u003e Operators\n\n    Compiler --\u003e Optimizer[Query Optimizer]\n    Optimizer --\u003e Operators[Operator Tree]\n    Operators --\u003e Executor[Plan Executor]\n    Executor --\u003e PAR[Parallel Executor\u003cbr/\u003eThreadPool]\n    Operators --\u003e Cypher[Cypher Compiler\u003cbr/\u003eopenCypher]\n\n    PAR --\u003e DS[Document Store\u003cbr/\u003eSQLite]\n    PAR --\u003e II[Inverted Index\u003cbr/\u003eSQLite + Analyzer]\n    PAR --\u003e VI[\"Vector Index\u003cbr/\u003eIVF\"]\n    PAR --\u003e SI[Spatial Index\u003cbr/\u003eR*Tree]\n    PAR --\u003e GS[Graph Store\u003cbr/\u003eSQLite\u003cbr/\u003eNamed Graphs]\n\n    subgraph Scoring [\"Scoring (bayesian-bm25)\"]\n        BM25[BM25]\n        BBFS[Bayesian BM25]\n        VS[Vector Scorer]\n    end\n\n    subgraph Fusion [\"Fusion (bayesian-bm25)\"]\n        LO[Log-Odds]\n        PB[Probabilistic Boolean]\n    end\n\n    Operators --\u003e Scoring\n    Operators --\u003e Fusion\n```\n\n### Package Structure\n\n```\nuqa/\n  core/           PostingList, types, hierarchical documents, functors\n  analysis/       Text analysis pipeline: CharFilter, Tokenizer, TokenFilter, Analyzer, dual index/search analyzers\n  storage/        Backend-agnostic ABCs with SQLite-backed stores: documents, inverted index, vectors (IVF), spatial (R*Tree), graph\n  operators/      Operator algebra (boolean, primitive, hybrid, aggregation (count/sum/avg/min/max/quantile),\n                  hierarchical (with cost estimation), sparse, multi-field, attention fusion,\n                  learned fusion, multi-stage, deep fusion (ResNet/GNN/CNN/DenseNet),\n                  deep learning (training pipeline, PyTorch GPU backend))\n  scoring/        BM25, Bayesian BM25, VectorScorer, WAND/BlockMaxWAND, calibration,\n                  parameter learning, external prior, multi-field, fusion WAND (via bayesian-bm25),\n                  adaptive WAND, bound tightness\n  fusion/         Log-odds conjunction (fuse + fuse_mean), probabilistic boolean, attention fusion,\n                  learned fusion, query features (via bayesian-bm25), adaptive fusion\n  graph/          GraphStore, traversal, pattern matching, RPQ, bounded RPQ, weighted paths,\n                  centrality (PageRank, HITS, betweenness), cross-paradigm, indexes,\n                  subgraph index, incremental matching, temporal filter/traverse/pattern,\n                  delta/versioned store, message passing, embeddings, named graphs,\n                  property indexes, join operators, RPQ optimizer, pattern negation,\n                  configurable graph scores (DEFAULT_GRAPH_SCORE)\n    cypher/       openCypher lexer, parser, AST, posting-list-based compiler\n  fdw/            Foreign Data Wrappers: DuckDB (Parquet/CSV/JSON), Arrow Flight SQL, Hive partitioning, full query pushdown\n  joins/          Hash, sort-merge, index, graph, cross-paradigm, similarity joins,\n                  semi-join, anti-join\n  execution/      Volcano iterator engine: Apache Arrow columnar batches, vectorized operators, disk spilling\n  planner/        Cost model, cardinality estimator, optimizer, DPccp join enumerator, parallel executor,\n                  information-theoretic bounds, graph cost model\n  search/         Search result highlighting, snippet extraction, FTS query term extraction\n  sql/            SQL compiler (pglast), expression evaluator, FTS query parser, table DDL/DML\n  api/            Fluent QueryBuilder\n  tests/          2880 tests across 84 test files\nbenchmarks/       309 pytest-benchmark tests across 15 files (posting list, storage, compiler,\n                  execution, planner, scoring, graph, graph centrality, end-to-end SQL,\n                  calibration, multi-field, external prior, advanced scoring, advanced graph,\n                  named graphs)\n```\n\n## Key Features\n\n### SQL Interface\n\n| Category | Syntax |\n|----------|--------|\n| DDL | `CREATE TABLE [IF NOT EXISTS]`, `CREATE TEMPORARY TABLE`, `DROP TABLE [IF EXISTS]`, `CREATE TABLE AS SELECT`, `ALTER TABLE` (ADD/DROP/RENAME COLUMN, SET/DROP DEFAULT, SET/DROP NOT NULL, ALTER TYPE USING, ADD CONSTRAINT), `TRUNCATE TABLE`, `CREATE INDEX`, `DROP INDEX`, `CREATE SEQUENCE`/`NEXTVAL`/`CURRVAL`/`SETVAL`, `ALTER SEQUENCE`, `CREATE SCHEMA`/`DROP SCHEMA [CASCADE]`, `TABLE name` |\n| FDW | `CREATE SERVER ... FOREIGN DATA WRAPPER`, `CREATE FOREIGN TABLE ... SERVER ... OPTIONS (...)`, `DROP SERVER`, `DROP FOREIGN TABLE`, Hive partitioning (`hive_partitioning` option), predicate pushdown (`=`, `!=`, `\u003c`, `\u003e`, `IN`, `LIKE`, `ILIKE`, `BETWEEN`), full query pushdown (JOINs, aggregates, window functions, subqueries), mixed foreign-local query optimization (local table shipping), DuckDB FDW (Parquet/CSV/JSON), Arrow Flight SQL FDW |\n| Constraints | `PRIMARY KEY`, `NOT NULL`, `DEFAULT` (literals and SQL functions like `CURRENT_TIMESTAMP`), `UNIQUE`, `CHECK`, `FOREIGN KEY` (with insert/update/delete validation) |\n| DML | `INSERT INTO ... VALUES`, `INSERT INTO ... SELECT`, `INSERT ... ON CONFLICT DO NOTHING/UPDATE`, `INSERT ... RETURNING`, `UPDATE ... SET ... WHERE [RETURNING]`, `UPDATE ... FROM` (join), `DELETE FROM ... WHERE [RETURNING]`, `DELETE ... USING` (join) |\n| DQL | `SELECT [DISTINCT] ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY [NULLS FIRST/LAST] ... LIMIT ... OFFSET`, `FETCH FIRST n ROWS ONLY`, standalone `VALUES` |\n| Joins | `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, `FULL OUTER JOIN`, `CROSS JOIN` with equality and non-equality `ON` conditions, `LATERAL` subquery |\n| Set Ops | `UNION [ALL]`, `INTERSECT [ALL]`, `EXCEPT [ALL]` with chaining |\n| Subqueries | `IN (SELECT ...)`, `EXISTS (SELECT ...)`, scalar subqueries, correlated subqueries, derived tables (`FROM (SELECT ...) AS alias`), `LATERAL` |\n| CTEs | `WITH name AS (SELECT ...)`, `WITH RECURSIVE` |\n| Views | `CREATE VIEW`, `DROP VIEW` |\n| Window | `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `NTILE`, `LAG`, `LEAD`, `NTH_VALUE`, `PERCENT_RANK`, `CUME_DIST`, aggregates `OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE BETWEEN ...)`, `WINDOW w AS (...)`, `FILTER (WHERE ...)` on window aggregates |\n| Aggregates | `COUNT [DISTINCT]`, `SUM`, `AVG`, `MIN`, `MAX`, `STRING_AGG`, `ARRAY_AGG`, `BOOL_AND`/`EVERY`, `BOOL_OR`, `STDDEV`/`VARIANCE`, `PERCENTILE_CONT/DISC`, `MODE`, `JSON_OBJECT_AGG`, `CORR`, `COVAR_POP/SAMP`, `REGR_*` (10 functions), `deep_learn(...)`, `FILTER (WHERE ...)`, `ORDER BY` within aggregate |\n| Types | `INTEGER`, `BIGINT`, `SERIAL`, `TEXT`, `VARCHAR`, `REAL`, `FLOAT`, `DOUBLE PRECISION`, `NUMERIC(p,s)`, `BOOLEAN`, `DATE`, `TIME`, `TIMESTAMP`, `TIMESTAMPTZ`, `INTERVAL`, `JSON`/`JSONB`, `UUID`, `BYTEA`, `INTEGER[]` (arrays), `VECTOR(N)`, `POINT` |\n| Date/Time | `EXTRACT`, `DATE_TRUNC`, `DATE_PART`, `NOW()`, `CURRENT_DATE`, `CURRENT_TIMESTAMP`, `CURRENT_TIME`, `CLOCK_TIMESTAMP`, `TIMEOFDAY`, `AGE`, `TO_CHAR`, `TO_DATE`, `TO_TIMESTAMP`, `MAKE_DATE`, `MAKE_TIMESTAMP`, `MAKE_INTERVAL`, `TO_NUMBER`, `OVERLAPS`, `ISFINITE` |\n| JSON | `-\u003e`, `-\u003e\u003e`, `#\u003e`, `#\u003e\u003e` operators, `@\u003e` / `\u003c@` containment, `?` / `?|` / `?\u0026` key existence, `JSONB_SET`, `JSONB_STRIP_NULLS`, `JSON_BUILD_OBJECT`, `JSON_BUILD_ARRAY`, `JSON_OBJECT_KEYS`, `JSON_EXTRACT_PATH`, `JSON_TYPEOF`, `JSON_AGG`, `::jsonb` cast |\n| Table Funcs | `GENERATE_SERIES`, `UNNEST`, `REGEXP_SPLIT_TO_TABLE`, `JSON_EACH`/`JSON_EACH_TEXT`, `JSON_ARRAY_ELEMENTS`/`JSON_ARRAY_ELEMENTS_TEXT` |\n| FTS | `column @@ 'query'` full-text search operator with query string mini-language: bare terms, `\"phrases\"`, `field:term`, `field:[vector]`, `AND`/`OR`/`NOT`, implicit AND, parenthesized grouping, hybrid text+vector fusion, `uqa_highlight()` result highlighting, `uqa_facets()` faceted search |\n| Functions | 90+ scalar functions: string (`CONCAT_WS`, `POSITION`, `LPAD`, `REVERSE`, `MD5`, `OVERLAY`, `REGEXP_MATCH`, `ENCODE`/`DECODE`, ...), math (`POWER`, `SQRT`, `LN`, `CBRT`, `GCD`, `LCM`, `MIN_SCALE`, `TRIM_SCALE`, trig, ...), conditional (`GREATEST`, `LEAST`, `NULLIF`) |\n| Prepared | `PREPARE name AS ...`, `EXECUTE name(params)`, `DEALLOCATE name` |\n| Utility | `EXPLAIN SELECT ...`, `ANALYZE [table]` |\n| Transactions | `BEGIN`, `COMMIT`, `ROLLBACK`, `SAVEPOINT`, `RELEASE SAVEPOINT`, `ROLLBACK TO SAVEPOINT` |\n| Session | `SET`/`SHOW`/`RESET`/`RESET ALL`, `DISCARD ALL`, `SET search_path`, `SET LOCAL` |\n| System | `information_schema.columns`, `pg_catalog.pg_tables`, `pg_catalog.pg_views`, `pg_catalog.pg_indexes`, `pg_catalog.pg_type` |\n\n### Extended WHERE Functions\n\n| Function | Description |\n|----------|-------------|\n| `column @@ 'query'` | Full-text search operator with query string mini-language (boolean, phrase, field targeting, hybrid text+vector) |\n| `text_match(field, 'query')` | Full-text search with BM25 scoring |\n| `bayesian_match(field, 'query')` | Bayesian BM25 — calibrated P(relevant) in [0,1] |\n| `knn_match(field, vector, k)` | K-nearest neighbor vector search (vector as `ARRAY[...]` or `$N`) |\n| `traverse_match(start, 'label', hops)` | Graph reachability as a scored signal |\n| `path_filter(path, value)` | Hierarchical equality filter (any-match on arrays) |\n| `path_filter(path, op, value)` | Hierarchical comparison filter (`\u003e`, `\u003c`, `\u003e=`, `\u003c=`, `!=`) |\n| `spatial_within(field, POINT(x,y), dist)` | Geospatial range query (R*Tree + Haversine) |\n| `sparse_threshold(signal, threshold)` | ReLU thresholding: max(0, score - threshold) |\n| `multi_field_match(f1, f2, ..., query)` | Multi-field Bayesian BM25 with log-odds fusion |\n| `bayesian_match_with_prior(f, q, pf, mode)` | Bayesian BM25 with external prior (recency/authority) |\n| `temporal_traverse(start, lbl, hops, ts)` | Time-aware graph traversal |\n| `message_passing(k, agg, property)` | GNN k-layer neighbor aggregation |\n| `graph_embedding(dims, k)` | Structural graph embeddings |\n| `vector_exclude(f, pos, neg, k, theta)` | Vector exclusion: positive minus negative similarity |\n| `pagerank([damping[, iter[, tol]]][, 'graph'])` | PageRank centrality scoring |\n| `hits([iter[, tol]][, 'graph'])` | HITS hub/authority scoring |\n| `betweenness(['graph'])` | Betweenness centrality (Brandes) |\n| `weighted_rpq('expr', start, 'prop'[, 'agg'[, threshold]])` | Weighted RPQ with aggregate predicates |\n\n### Fusion Meta-Functions\n\n| Function | Description |\n|----------|-------------|\n| `fuse_log_odds(sig1, sig2, ...[, alpha][, 'relu'\\|'swish'])` | Log-odds conjunction with optional gating |\n| `fuse_prob_and(sig1, sig2, ...)` | Probabilistic AND: P = prod(P_i) |\n| `fuse_prob_or(sig1, sig2, ...)` | Probabilistic OR: P = 1 - prod(1 - P_i) |\n| `fuse_prob_not(signal)` | Probabilistic NOT: P = 1 - P_signal |\n| `fuse_attention(sig1, sig2, ...)` | Attention-weighted log-odds fusion |\n| `fuse_learned(sig1, sig2, ...)` | Learned-weight log-odds fusion |\n| `staged_retrieval(sig1, k1, sig2, k2, ...)` | Multi-stage cascading retrieval pipeline |\n| `progressive_fusion(sig1, sig2, k1, sig3, k2[, alpha][, 'gating'])` | Progressive multi-stage WAND fusion |\n| `deep_fusion(layer(...), propagate(...), convolve(...), ...[, gating])` | Multi-layer Bayesian fusion (ResNet + GNN + CNN) |\n\n### Deep Fusion Layer Functions\n\nUsed inside `deep_fusion()` to compose neural network pipelines:\n\n| Function | Description |\n|----------|-------------|\n| `layer(sig1, sig2, ...)` | Signal layer: log-odds conjunction with residual connection (ResNet) |\n| `propagate('label', 'agg'[, 'dir'])` | Graph propagation: spread scores through edges (GNN) |\n| `convolve('label', ARRAY[w...][, 'dir'])` | Spatial convolution: weighted multi-hop BFS aggregation (CNN) |\n| `pool('label', 'method', size[, 'dir'])` | Spatial downsampling via greedy BFS partitioning |\n| `dense(ARRAY[W], ARRAY[b], output_channels =\u003e N, input_channels =\u003e M)` | Fully connected layer |\n| `flatten()` | Collapse spatial nodes into a single vector |\n| `global_pool('avg'\\|'max'\\|'avg_max')` | Channel-preserving spatial reduction (alternative to flatten) |\n| `softmax()` | Classification head (numerically stable) |\n| `batch_norm([epsilon =\u003e 1e-5])` | Per-channel normalization across nodes |\n| `dropout(p)` | Inference-mode scaling by (1 - p) |\n| `attention(n_heads =\u003e N, mode =\u003e 'content'\\|'random_qk'\\|'learned_v')` | Self-attention: context-dependent PoE (Theorem 8.3) |\n| `model('name', $1)` | Load trained model and create full inference pipeline (embed + conv + pool + dense + softmax) |\n| `embed(vector, in_channels =\u003e C, grid_h =\u003e H, grid_w =\u003e W)` | Inject raw embedding vector into channel map |\n\n### Deep Learning Functions\n\n| Function | Description |\n|----------|-------------|\n| `deep_learn('model', label, embedding, 'edge_label', layers...[, gating][, lambda][, l1_ratio][, prune_ratio])` | SELECT aggregate: train a CNN classifier analytically (ridge regression, no backpropagation). Optional L1 regularization and magnitude pruning. Layers include `convolve(n_channels =\u003e N[, init =\u003e 'kaiming'\\|'orthogonal'\\|'gabor'\\|'kmeans'])`, `pool()`, `flatten()`, `global_pool()`, `dense()`, `softmax()`, `attention()`. |\n| `deep_predict('model', embedding)` | Per-row scalar: inference with trained model, returns class probabilities |\n| `build_grid_graph('table', rows, cols, 'label')` | FROM-clause: construct 4-connected grid graph for spatial convolution |\n\n### SELECT Spatial Functions\n\n| Function | Description |\n|----------|-------------|\n| `ST_Distance(point1, point2)` | Haversine great-circle distance in meters |\n| `ST_Within(point1, point2, dist)` | Distance predicate (boolean) |\n| `ST_DWithin(point1, point2, dist)` | Alias for ST_Within |\n| `POINT(x, y)` | Construct a POINT value (longitude, latitude) |\n\n### SELECT Scalar Functions\n\n| Function | Description |\n|----------|-------------|\n| `path_agg(path, func)` | Per-row nested array aggregation (`sum`, `count`, `avg`, `min`, `max`) |\n| `path_value(path)` | Access nested field value by dot-path |\n| `uqa_highlight(col, query [, start, end [, frags, size]])` | Highlight matched terms with tags (stemming-aware) |\n| `uqa_facets(col [, col2, ...])` | Facet counts over search results (`facet_value \\| facet_count`) |\n| `deep_predict('model', embedding)` | Inference with trained model (class probabilities) |\n\n### FROM-Clause Table Functions\n\n| Function | Description |\n|----------|-------------|\n| `traverse(start, 'label', hops)` | BFS graph traversal |\n| `rpq('path_expr', start)` | Regular path query (NFA simulation) |\n| `text_search('query', 'field', 'table')` | Table-scoped full-text search |\n| `generate_series(start, stop[, step])` | Generate a series of values |\n| `unnest(array)` | Expand an array to a set of rows |\n| `regexp_split_to_table(str, pattern)` | Split string by regex into rows |\n| `json_each(json)` / `json_each_text(json)` | Expand JSON object to key/value rows |\n| `json_array_elements(json)` | Expand JSON array to a set of rows |\n| `pagerank([damping][, 'table_or_graph'])` | PageRank centrality as table source |\n| `hits([iter][, 'table_or_graph'])` | HITS hub/authority as table source |\n| `betweenness(['table_or_graph'])` | Betweenness centrality as table source |\n| `graph_add_vertex(id, 'label', 'table'[, 'props'])` | Add graph vertex to table's graph store |\n| `graph_add_edge(eid, src, tgt, 'label', 'table'[, 'props'])` | Add graph edge to table's graph store |\n| `create_graph('name')` | Create a named graph namespace |\n| `drop_graph('name')` | Drop a named graph |\n| `cypher('graph', $$ query $$) AS (cols)` | Execute openCypher query on a named graph |\n| `create_analyzer('name', 'config')` | Create a custom text analyzer (JSON config) |\n| `drop_analyzer('name')` | Drop a custom text analyzer |\n| `set_table_analyzer('tbl', 'field', 'name'[, 'phase'])` | Assign index/search analyzer to a field |\n| `list_analyzers()` | List all registered analyzers |\n| `build_grid_graph('table', rows, cols, 'label')` | Construct 4-connected grid graph for spatial convolution |\n\n### Persistence\n\nAll data is persisted to SQLite when an engine is created with `db_path`:\n\n| Store | SQLite Table | Description |\n|-------|-------------|-------------|\n| Documents | `_data_{table}` | Typed columns per table |\n| Inverted Index | `_inverted_{table}_{field}` | Per-table per-field posting lists |\n| Field Stats | `_field_stats_{table}` | Per-table field-level statistics (BM25) |\n| Doc Lengths | `_doc_lengths_{table}` | Per-table per-document token lengths (BM25) |\n| Vectors | `_ivf_centroids_{table}_{field}`, `_ivf_lists_{table}_{field}` | IVF index via `CREATE INDEX ... USING hnsw` or `USING ivf`; centroids in memory, posting lists in SQLite |\n| Spatial | `_rtree_{table}_{field}` | R*Tree virtual table for POINT columns; created via `CREATE INDEX ... USING rtree` |\n| Graph | `_graph_vertices_{table}`, `_graph_edges_{table}` | Per-table adjacency-indexed graph with vertex labels |\n| Named Graphs | `_graph_catalog_{table}`, `_graph_membership_{table}` | Per-graph partitioned adjacency with catalog and membership tables |\n| GIN Indexes | Catalog entry + `fts_fields` | `CREATE INDEX ... USING gin`; controls which columns are indexed in the inverted index |\n| B-tree Indexes | SQLite indexes on `_data_{table}` | `CREATE INDEX` support |\n| Analyzers | `_analyzers` | Custom text analyzer configurations |\n| Field Analyzers | `_table_field_analyzers` | Per-field index/search analyzer assignments |\n| Foreign Servers | `_foreign_servers` | FDW server definitions (type, connection options) |\n| Foreign Tables | `_foreign_tables` | FDW table definitions (columns, source, options) |\n| Path Indexes | `_path_indexes` | Pre-computed label-sequence RPQ accelerators |\n| Statistics | `_column_stats` | Per-table histograms and MCVs for optimizer |\n| Models | `_models` | Trained deep learning model configurations (JSON) |\n\n### Query Optimizer\n\n- Algebraic simplification (idempotent intersection/union, absorption law, empty elimination)\n- Cost-based optimization with equi-depth histograms and Most Common Values (MCV)\n- **DPccp join order optimization** (Moerkotte \u0026 Neumann, 2006) — O(3^n) dynamic programming over connected subgraph complement pairs; produces optimal bushy join trees for INNER JOIN chains with 2+ relations; greedy fallback for 16+ relations; bitmask DP table with bytearray connectivity lookup and incremental connected subgraph enumeration\n- Filter pushdown into intersections (recursive through nested IntersectOperators)\n- Vector threshold merge with floating-point tolerance (same query vector, `np.allclose`)\n- Intersect operand reordering by execution cost (cheapest first)\n- Fusion signal reordering by cost (cheapest first)\n- Early termination in IntersectOperator (skip remaining operands when accumulator is empty)\n- Predicate-aware cardinality damping (same-column vs different-column correlation)\n- Join-algorithm-aware DPccp cost model (index join vs hash join threshold)\n- R*Tree spatial index scan for POINT column range queries\n- GIN index for explicit full-text search column management (`CREATE INDEX ... USING gin`)\n- B-tree index scan substitution (replace full scans when profitable)\n- FDW predicate pushdown (comparison, IN, LIKE, ILIKE, BETWEEN pushed to DuckDB/Arrow Flight SQL for Hive partition pruning)\n- FDW full query pushdown (same-server queries delegated entirely to DuckDB/Arrow Flight SQL via AST deparsing)\n- FDW mixed foreign-local optimization (small local tables shipped to DuckDB for in-process JOIN execution)\n- Cross-paradigm cardinality estimation for text, vector, graph, fusion, temporal, and GNN operators\n- Edge property filter pushdown into graph pattern constraints\n- Join-pattern fusion (merge intersected pattern matches with shared variables)\n- Cross-paradigm join cost models (text similarity, vector similarity, graph, hybrid joins)\n- Threshold-aware vector selectivity estimation (4-tier threshold buckets)\n- Temporal graph cardinality correction with timestamp/range selectivity\n- Path index acceleration for simple Concat-of-Labels RPQ expressions and Cypher MATCH patterns\n- CTE inlining for single-reference non-recursive CTEs\n- Predicate pushdown into views and derived tables\n- Implicit cross join reordering via DPccp when equijoin predicates exist in WHERE\n- Filter pushdown into graph traverse operators (vertex predicate BFS pruning)\n- Graph-aware fusion signal reordering with per-graph cost model\n- Named graph scoped statistics (degree distribution, label degree, vertex label counts)\n- Information-theoretic cardinality lower bounds (entropy-based, histogram-aware)\n- Hierarchical operator cost estimation (PathFilter, PathProject, PathUnnest, PathAggregate)\n- Negation-aware pattern match cost estimation\n\n### Disk Spilling\n\nBlocking operators (sort, hash-aggregate, distinct) spill intermediate data to temporary Arrow IPC files when the input exceeds `spill_threshold` rows, bounding memory usage for large queries:\n\n| Operator | Strategy |\n|----------|----------|\n| `SortOp` | External merge sort — sorted runs spilled to disk, merged via k-way min-heap |\n| `HashAggOp` | Grace hash — rows partitioned into 16 on-disk files, each aggregated independently |\n| `DistinctOp` | Hash partition dedup — same partitioning, per-partition deduplication |\n\n```python\nengine = Engine(db_path=\"my.db\", spill_threshold=100000)  # spill after 100K rows\nengine = Engine(spill_threshold=0)                        # disable (default)\n```\n\n### Parallel Execution\n\nIndependent operator branches (Union, Intersect, Fusion signals) execute concurrently via `ThreadPoolExecutor`. Configure with `parallel_workers` parameter:\n\n```python\nengine = Engine(db_path=\"my.db\", parallel_workers=4)  # default: 4\nengine = Engine(parallel_workers=0)                   # disable parallelism\n```\n\n## Requirements\n\n- Python 3.12+\n- numpy \u003e= 1.26\n- pyarrow \u003e= 10.0\n- bayesian-bm25 \u003e= 0.8.1\n- duckdb \u003e= 1.0\n- pglast \u003e= 7.0\n- prompt-toolkit \u003e= 3.0\n- pygments \u003e= 2.17\n\n## Installation\n\n```bash\npip install uqa\n\n# From source\npip install -e .\n\n# With development dependencies\npip install -e \".[dev]\"\n```\n\n## Usage\n\n### Interactive SQL Shell\n\n```bash\nusql                 # In-memory\nusql --db mydata.db  # Persistent database\n```\n\nShell commands:\n\n| Command | Description |\n|---------|-------------|\n| `\\dt` | List tables (regular and foreign) |\n| `\\d \u003ctable\u003e` | Describe table schema (regular or foreign) |\n| `\\di` | List GIN-indexed fields per table |\n| `\\dF` | List foreign tables (server, source, options) |\n| `\\dS` | List foreign servers (type, connection options) |\n| `\\dg` | List named graphs (vertex/edge counts) |\n| `\\ds \u003ctable\u003e` | Show column statistics (requires `ANALYZE` first) |\n| `\\x` | Toggle expanded (vertical) display |\n| `\\o [file]` | Redirect output to file (no arg restores stdout) |\n| `\\timing` | Toggle query timing display |\n| `\\reset` | Reset the engine |\n| `\\?` | Show help |\n| `\\q` | Quit |\n\n### Python API\n\n```python\nfrom uqa.engine import Engine\n\n# In-memory engine\nengine = Engine()\n\n# Persistent engine (SQLite-backed)\nengine = Engine(db_path=\"research.db\")\n\nengine.sql(\"\"\"\n    CREATE TABLE papers (\n        id SERIAL PRIMARY KEY,\n        title TEXT NOT NULL,\n        year INTEGER NOT NULL,\n        citations INTEGER DEFAULT 0\n    )\n\"\"\")\n\nengine.sql(\"\"\"INSERT INTO papers (title, year, citations) VALUES\n    ('attention is all you need', 2017, 90000),\n    ('bert pre-training', 2019, 75000)\n\"\"\")\n\nengine.sql(\"ANALYZE papers\")\n\nresult = engine.sql(\"\"\"\n    SELECT title, _score FROM papers\n    WHERE text_match(title, 'attention') ORDER BY _score DESC\n\"\"\")\nprint(result)\n\n# Retrieve a document by ID\ndoc = engine.get_document(1, table=\"papers\")\n\nengine.close()  # or use: with Engine(db_path=\"...\") as engine:\n```\n\n### Fluent QueryBuilder API\n\n```python\nfrom uqa.core.types import Equals, GreaterThanOrEqual\n\n# Text search with scoring\nresult = (\n    engine.query(table=\"papers\")\n    .term(\"attention\", field=\"title\")\n    .score_bayesian_bm25(\"attention\")\n    .execute()\n)\n\n# Nested data: filter + aggregate\nresult = (\n    engine.query(table=\"orders\")\n    .filter(\"shipping.city\", Equals(\"Seoul\"))\n    .path_aggregate(\"items.price\", \"sum\")\n    .execute()\n)\n\n# Graph traversal + aggregation\nteam = engine.query(table=\"employees\").traverse(2, \"manages\", max_hops=2)\ntotal = team.vertex_aggregate(\"salary\", \"sum\")\n\n# Multi-field search with per-field weights\nresult = (\n    engine.query(table=\"papers\")\n    .score_multi_field_bayesian(\"attention\", [\"title\", \"abstract\"], [2.0, 1.0])\n    .execute()\n)\n\n# Multi-stage pipeline: broad recall -\u003e re-rank\ns1 = engine.query(table=\"papers\").score_bayesian_bm25(\"transformer\", \"title\")\ns2 = engine.query(table=\"papers\").score_bayesian_bm25(\"attention\", \"abstract\")\nresult = engine.query(table=\"papers\").multi_stage([(s1, 50), (s2, 10)]).execute()\n\n# Temporal graph traversal\nresult = engine.query(table=\"social\").temporal_traverse(\n    1, \"knows\", max_hops=2, timestamp=1700000000.0\n).execute()\n\n# Facets over all documents\nfacets = engine.query(table=\"papers\").facet(\"status\")\n```\n\n### Arrow / Parquet Export\n\n```python\n# SQL result -\u003e Arrow Table (zero-copy from execution engine)\nresult = engine.sql(\"SELECT title, year FROM papers ORDER BY year\")\narrow_table = result.to_arrow()\n\n# SQL result -\u003e Parquet file\nresult.to_parquet(\"papers.parquet\")\n\n# Fluent API -\u003e Arrow Table\narrow_table = (\n    engine.query(table=\"papers\")\n    .term(\"attention\", field=\"title\")\n    .score_bm25(\"attention\")\n    .execute_arrow()\n)\n\n# Fluent API -\u003e Parquet file\nengine.query(table=\"papers\").term(\"attention\", field=\"title\").execute_parquet(\"results.parquet\")\n```\n\n## Examples\n\n### Quickstart\n\n```bash\npython examples/quickstart.py      # Hybrid search in under 30 lines\n```\n\n### Fluent API (`examples/fluent/`)\n\n```bash\npython examples/fluent/text_search.py         # BM25, Bayesian BM25, boolean, facets\npython examples/fluent/vector_and_hybrid.py   # KNN, hybrid, vector exclusion, fusion\npython examples/fluent/graph.py               # Traversal, RPQ, pattern matching, indexes\npython examples/fluent/hierarchical.py        # Nested data, path filters, aggregation\npython examples/fluent/multi_paradigm.py      # Multi-signal fusion, graph analytics, query plans\npython examples/fluent/scoring.py             # Bayesian BM25, sparse threshold, multi-field, priors\npython examples/fluent/fusion_advanced.py     # Attention/learned fusion, multi-stage pipelines\npython examples/fluent/graph_advanced.py      # Temporal traversal, message passing, path index, delta\npython examples/fluent/graph_centrality.py    # PageRank, HITS, betweenness, bounded RPQ, weighted paths, indexing\npython examples/fluent/named_graphs.py        # Named graphs, graph algebra, property indexes, functors, adaptive fusion\npython examples/fluent/analysis.py            # Text analysis pipeline, tokenizers, filters, stemming\npython examples/fluent/export.py              # Arrow/Parquet export from fluent queries\n```\n\n### SQL (`examples/sql/`)\n\n```bash\npython examples/sql/basics.py                 # DDL, DML, SELECT, CTE, window, transactions, views\npython examples/sql/functions.py              # text_match, knn_match, path_agg, path_value, path_filter\npython examples/sql/graph.py                  # FROM traverse/rpq, aggregates, GROUP BY, WHERE\npython examples/sql/fts_match.py              # @@ operator: boolean, phrase, field, hybrid text+vector\npython examples/sql/fusion.py                 # fuse_log_odds, fuse_prob_and/or/not, EXPLAIN\npython examples/sql/joins_and_subqueries.py   # JOINs, derived tables, set operations, recursive CTE\npython examples/sql/analytics.py              # Aggregates, window functions, JSON, date/time, UPSERT\npython examples/sql/analysis.py               # Text analyzers via SQL: create, list, drop, persistence\npython examples/sql/export.py                 # Arrow/Parquet export from SQL queries\npython examples/sql/spatial.py                # Geospatial: POINT, R*Tree, spatial_within, ST_Distance, fusion\npython examples/sql/synonyms.py               # Synonym search: dual analyzers, index/search-time expansion\npython examples/sql/scoring_advanced.py       # Sparse threshold, multi-field, attention/learned fusion, staged retrieval\npython examples/sql/calibration.py            # ECE, Brier, reliability diagram, parameter learning\npython examples/sql/temporal_graph.py         # Temporal traversal, message passing, graph embeddings\npython examples/sql/graph_delta.py            # Delta operations, path index invalidation, rollback\npython examples/sql/fdw.py                    # Foreign Data Wrappers, Hive partitioning, predicate pushdown\npython examples/sql/nyc_taxi.py               # NYC Taxi analytics, remote Parquet, full query pushdown, spatial JOIN\npython examples/sql/fusion_gating.py          # ReLU/Swish gating, alpha+gating, progressive fusion\npython examples/sql/graph_centrality.py       # PageRank, HITS, betweenness, bounded RPQ, weighted RPQ via SQL\npython examples/sql/named_graphs.py           # Named graphs via SQL: traverse, RPQ, Cypher, centrality, algebra\n```\n\n### Showcase (`examples/showcase/`)\n\n```bash\npython examples/showcase/knowledge_discovery.py  # Cross-paradigm unification: SQL + FTS + vector + graph + Cypher\npython examples/showcase/calibration_matters.py  # Bayesian fusion vs naive scoring, calibration metrics\npython examples/showcase/bayesian_neural.py      # Bayesian fusion IS a feedforward neural network (Paper 4)\npython examples/showcase/deep_fusion_resnet.py   # Deep fusion as ResNet: hierarchical signal layers\npython examples/showcase/deep_fusion_gnn.py      # Deep fusion as GNN: graph propagation layers\npython examples/showcase/deep_fusion_cnn.py      # Deep fusion as CNN: spatial convolution over grids\npython examples/showcase/deep_fusion_nn.py       # Full neural network: pool, dense, flatten, softmax, batch_norm, dropout\npython examples/showcase/deep_learn_mnist.py     # MNIST training pipeline: deep_learn + deep_predict + deep_fusion(model())\npython examples/showcase/deep_learn_tiny_imagenet.py  # Tiny ImageNet RGB: 50-class CNN training on 64x64 images\npython examples/showcase/deep_learn_attention.py # Self-attention on MNIST: content, random_qk, learned_v modes\npython examples/showcase/deep_learn_mnist_pruning.py  # Neural network pruning: elastic net + magnitude pruning\n```\n\n### Interactive Shell\n\n```bash\nusql                 # In-memory\nusql --db mydata.db  # Persistent\n```\n\n## Tests\n\n```bash\n# Run all tests\npython -m pytest uqa/tests/ -v\n\n# Run a specific test file\npython -m pytest uqa/tests/test_sql.py -v\n\n# Run benchmarks (requires pytest-benchmark)\npip install -e \".[benchmark]\"\npython -m pytest benchmarks/ --benchmark-sort=name\n```\n\n## License\n\nAGPL-3.0 — see [LICENSE](LICENSE).\n\n## References\n\n1. [A Unified Mathematical Framework for Query Algebras Across Heterogeneous Data Paradigms](docs/papers/1.%20A%20Unified%20Mathematical%20Framework%20for%20Query%20Algebras%20Across%20Heterogeneous%20Data%20Paradigms.pdf)\n2. [Extending the Unified Mathematical Framework to Support Graph Data Structures](docs/papers/2.%20Extending%20the%20Unified%20Mathematical%20Framework%20to%20Support%20Graph%20Data%20Structures.pdf)\n3. [Bayesian BM25 - A Probabilistic Framework for Hybrid Text and Vector Search](docs/papers/3.%20Bayesian%20BM25%20-%20A%20Probabilistic%20Framework%20for%20Hybrid%20Text%20and%20Vector%20Search.pdf)\n4. [From Bayesian Inference to Neural Computation - The Analytical Emergence of Neural Network Structure from Probabilistic Relevance Estimation](docs/papers/4.%20From%20Bayesian%20Inference%20to%20Neural%20Computation%20-%20The%20Analytical%20Emergence%20of%20Neural%20Network%20Structure%20from%20Probabilistic%20Relevance%20Estimation.pdf)\n5. [Vector Scores as Likelihood Ratios - Index-Derived Bayesian Calibration for Hybrid Search](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf)\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcognica-io%2Fuqa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcognica-io%2Fuqa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcognica-io%2Fuqa/lists"}