https://github.com/cognica-io/uqa

UQA — Unified Query Algebra: a multi-paradigm database engine unifying relational, text retrieval, vector search, graph query, and geospatial paradigms under a single algebraic structure
https://github.com/cognica-io/uqa
bm25 database full-text-search geospatial graph-database information-retrieval multi-paradigm opencypher posting-list python query-engine sql vector-search
Last synced: 3 months ago
JSON representation
UQA — Unified Query Algebra: a multi-paradigm database engine unifying relational, text retrieval, vector search, graph query, and geospatial paradigms under a single algebraic structure
Host: GitHub
URL: https://github.com/cognica-io/uqa
Owner: cognica-io
License: agpl-3.0
Created: 2026-03-06T13:08:41.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-04-01T08:30:29.000Z (3 months ago)
Last Synced: 2026-04-01T10:25:19.282Z (3 months ago)
Topics: bm25, database, full-text-search, geospatial, graph-database, information-retrieval, multi-paradigm, opencypher, posting-list, python, query-engine, sql, vector-search
Language: Python
Homepage: https://cognica-io.github.io/uqa/
Size: 70.2 MB
Stars: 3
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project

awesome-rainmana - cognica-io/uqa - UQA — Unified Query Algebra: a multi-paradigm database engine unifying relational, text retrieval, vector search, graph query, and geospatial paradigms under a single algebraic structure (Python)
README

          # UQA — Unified Query Algebra

A multi-paradigm database engine that unifies **relational**, **text retrieval**, **vector search**, **graph query**, and **geospatial** paradigms under a single algebraic structure, using posting lists as the universal abstraction. SQL interface targets **PostgreSQL 17** compatibility.

> **Background:** The unified query algebra theory behind this project is already deployed in production as [Cognica Database](https://cognica.io), a commercial multi-paradigm database engine built in C++20/23. UQA is the standalone Python implementation of that theory, open-sourced under AGPL-3.0. It is under active development and serves both as a production-ready embeddable database and as a reference implementation for the underlying algebraic framework.

## Background

Modern data systems are fragmented into specialized engines: relational databases built on relational algebra, search engines on probabilistic IR models, vector databases on geometric similarity, and graph databases on traversal semantics. UQA eliminates this fragmentation by proving that a single algebraic structure can express operations across all four paradigms.

### Posting Lists as Universal Abstraction

The core insight is that **posting lists** — sorted sequences of `(document_id, payload)` pairs — can represent result sets from any paradigm. A posting list $L$ is defined as:

$$

L = [(id_1, payload_1),\ (id_2, payload_2),\ \ldots,\ (id_n, payload_n)]

$$

where $id_i < id_j$ for all $i < j$. A bijection $PL: 2^{\mathcal{D}} \rightarrow \mathcal{L}$ maps document sets to posting lists and back, allowing set-theoretic reasoning to carry over directly.

### Boolean Algebra

The structure $(\mathcal{L},\ \cup,\ \cap,\ \overline{\cdot},\ \emptyset,\ \mathcal{D})$ forms a **complete Boolean algebra** — satisfying commutativity, associativity, distributivity, identity, and complement laws. This means any combination of AND, OR, and NOT across paradigms is algebraically well-defined, and query optimization can exploit lattice-theoretic rewrite rules.

### Cross-Paradigm Operators

Primitive operators map each paradigm into the posting list space:

| Operator | Definition | Paradigm |

|----------|-----------|----------|

| $T(t)$ | $PL(\lbrace d \in \mathcal{D} \mid t \in term(d, f) \rbrace)$ | Text retrieval |

| $V_\theta(q)$ | $PL(\lbrace d \in \mathcal{D} \mid sim(vec(d, f),\ q) \geq \theta \rbrace)$ | Vector search |

| $KNN_k(q)$ | $PL(D_k)$ where $\|D_k\| = k$, ranked by similarity | Vector search |

| $Filter_{f,v}(L)$ | $L \cap PL(\lbrace d \in \mathcal{D} \mid d.f = v \rbrace)$ | Relational |

| $Score_q(L)$ | $(L,\ [s_1, \ldots, s_{\|L\|}])$ | Scoring |

Because every operator produces a posting list, they compose freely. A hybrid text + vector search is simply an intersection:

$$

Hybrid_{t,q,\theta} = T(t) \cap V_\theta(q)

$$

### Graph Extension

The second paper extends the framework to graph data by establishing a **Graph-Posting List Isomorphism**. A graph posting list $L_G = [(id_1, G_1), \ldots, (id_n, G_n)]$ maps to standard posting lists via:

$$

\Phi(L_G) = PL\left(\bigcup_{i=1}^{n} \phi_{G \rightarrow D}(G_i)\right)

$$

This isomorphism preserves Boolean operations — $\Phi(L_G^1 \cup_G L_G^2) = \Phi(L_G^1) \cup \Phi(L_G^2)$ — so graph traversals, pattern matches, and path queries integrate seamlessly with text, vector, and relational operations under the same algebra.

### Vector Calibration

The fifth paper addresses a fundamental gap in hybrid search: vector similarity scores (cosine similarity, inner product, Euclidean distance) are geometric quantities, not probabilities. A cosine similarity of 0.85 does not mean an 85% chance of relevance, yet hybrid systems routinely combine such scores with calibrated lexical signals through ad-hoc normalization. The paper presents a Bayesian calibration framework that transforms vector scores into calibrated relevance probabilities through a likelihood ratio formulation:

$$

\text{logit}\ P(R=1 \mid d) = \log \frac{f_R(d)}{f_G(d)} + \text{logit}\ P(R=1)

$$

where $f_R(d)$ is the local distance density among relevant documents and $f_G(d)$ is the global background density. This has the same additive structure as Bayesian BM25 calibration, establishing a structural identity between lexical and dense retrieval scoring. Both densities are extracted from statistics already computed during ANN index construction and search — IVF cell populations and intra-cluster distances, HNSW edge distances and search trajectories — at negligible additional cost. The resulting calibrated vector scores integrate with Bayesian BM25 through additive log-odds:

$$

\text{logit}\ P(R \mid d_{vec}, s_{bm25}) = \underbrace{\log \frac{\hat{f}_R(d)}{f_G(d)}}_{\text{calibrated vector}} + \underbrace{\alpha(s_{bm25} - \beta)}_{\text{calibrated lexical}} + \underbrace{\text{logit}\ P_{base}}_{\text{corpus prior}}

$$

This completes the probabilistic unification of sparse and dense retrieval: both paradigms are calibrated through the same Bayesian likelihood ratio structure, each drawing on the statistics of its native index. For full treatment, see [Paper 5](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf).

### Compositional Completeness

The framework guarantees that **any query expressible as a combination of relational, text, vector, and graph operations** has a representation in the unified algebra (Theorem 3.3.5). This is not merely an interface unification — the algebraic closure ensures that cross-paradigm queries (e.g., "find papers cited by graph neighbors whose embeddings are similar to a query vector and whose titles match a keyword") are first-class operations with well-defined optimization rules.

For full formal treatment, see [Paper 1](docs/papers/1.%20A%20Unified%20Mathematical%20Framework%20for%20Query%20Algebras%20Across%20Heterogeneous%20Data%20Paradigms.pdf), [Paper 2](docs/papers/2.%20Extending%20the%20Unified%20Mathematical%20Framework%20to%20Support%20Graph%20Data%20Structures.pdf), [Paper 3](docs/papers/3.%20Bayesian%20BM25%20-%20A%20Probabilistic%20Framework%20for%20Hybrid%20Text%20and%20Vector%20Search.pdf), and [Paper 5](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf).

## Overview

UQA extends standard SQL with cross-paradigm query functions:

```sql

-- GIN index: enable full-text search on specific columns (PostgreSQL-compatible)

CREATE INDEX idx_articles_gin ON articles USING gin (title, body);

CREATE INDEX idx_papers_gin ON papers USING gin (title, abstract)

    WITH (analyzer='english_stem');

-- Full-text search with @@ operator (query string mini-language)

SELECT title, _score FROM articles

WHERE title @@ 'database AND query' ORDER BY _score DESC;

-- Search result highlighting (matched terms wrapped in tags)

SELECT title, uqa_highlight(body, 'database query') AS snippet

FROM articles WHERE body @@ 'database query'

ORDER BY _score DESC;

-- Custom highlight tags and snippet extraction

SELECT title, uqa_highlight(body, 'search', '', '', 2, 100) AS snippet

FROM articles WHERE body @@ 'search';

-- Faceted search: value counts over search results

SELECT uqa_facets(category) FROM articles WHERE body @@ 'database';

-- Returns: facet_value | facet_count

-- Multi-field facets in a single query

SELECT uqa_facets(category, author) FROM articles WHERE body @@ 'database';

-- Returns: facet_field | facet_value | facet_count

-- Hybrid text + vector fusion via @@

SELECT title, _score FROM articles

WHERE _all @@ 'body:search AND embedding:[0.1, 0.9, 0.0, 0.0]'

ORDER BY _score DESC;

-- Full-text search with BM25 scoring

SELECT title, _score FROM papers

WHERE text_match(title, 'attention transformer') ORDER BY _score DESC;

-- Multi-signal fusion: text + vector + graph

SELECT title, _score FROM papers

WHERE fuse_log_odds(

    text_match(title, 'attention'),

    knn_match(embedding, ARRAY[0.1, 0.2, ...], 5),

    traverse_match(1, 'cited_by', 2)

) AND year >= 2020

ORDER BY _score DESC;

-- Multi-stage retrieval: broad recall -> precise re-ranking

SELECT title, _score FROM papers

WHERE staged_retrieval(

    bayesian_match(title, 'transformer attention'), 50,

    bayesian_match(abstract, 'self attention mechanism'), 10

) ORDER BY _score DESC;

-- Multi-field search across title + abstract

SELECT title, _score FROM papers

WHERE multi_field_match(title, abstract, 'attention transformer')

ORDER BY _score DESC;

-- Deep fusion: multi-layer neural network as SQL

SELECT id, _score FROM patches

WHERE deep_fusion(

    layer(knn_match(embedding, $1, 16)),

    convolve('spatial', ARRAY[0.6, 0.4]),

    pool('spatial', 'max', 2),

    flatten(),

    dense(ARRAY[...], ARRAY[...], output_channels => 4, input_channels => 8),

    softmax(),

    gating => 'relu'

) ORDER BY _score DESC;

-- Deep learning: train a CNN classifier (no backpropagation)

SELECT deep_learn(

    'mnist_cnn', label, embedding, 'spatial',

    convolve(n_channels => 32),

    pool('max', 2),

    attention(n_heads => 4, mode => 'learned_v'),

    convolve(n_channels => 64),

    pool('max', 2),

    flatten(),

    dense(output_channels => 10),

    softmax(),

    gating => 'relu', lambda => 1.0,

    l1_ratio => 0.3, prune_ratio => 0.5

) FROM mnist_train;

-- Deep learning: inference with trained model

SELECT id, deep_predict('mnist_cnn', embedding) AS pred FROM test_data;

-- Deep learning: inference via deep_fusion pipeline

SELECT id, _score, class_probs FROM grid_28x28

WHERE deep_fusion(

    model('mnist_cnn', $1),

    gating => 'relu'

) ORDER BY _score DESC;

-- Temporal graph traversal (edges valid at timestamp)

SELECT * FROM temporal_traverse(1, 'knows', 2, 1700000000);

-- JOINs with qualified columns

SELECT e.name, d.name AS dept, e.salary

FROM employees e

INNER JOIN departments d ON e.dept_id = d.id

ORDER BY e.salary DESC;

-- Window functions with frames

SELECT rep, sale_date, amount,

       SUM(amount) OVER (ORDER BY sale_date

           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total

FROM sales;

-- Recursive CTE

WITH RECURSIVE org_tree AS (

    SELECT id, name, 1 AS depth FROM org_chart WHERE manager_id IS NULL

    UNION ALL

    SELECT o.id, o.name, t.depth + 1

    FROM org_chart o INNER JOIN org_tree t ON o.manager_id = t.id

)

SELECT name, depth FROM org_tree ORDER BY depth;

-- Advanced aggregates with FILTER and CASE pivot

SELECT region,

       SUM(amount) FILTER (WHERE returned = FALSE) AS net_revenue,

       COUNT(*) FILTER (WHERE returned = TRUE) AS return_count

FROM sales GROUP BY region;

-- Date/time functions

SELECT DATE_TRUNC('month', sale_date) AS month,

       COUNT(*) AS num_sales, SUM(amount) AS revenue

FROM sales GROUP BY DATE_TRUNC('month', sale_date);

-- Graph traversal and regular path queries

SELECT _doc_id, title FROM traverse(1, 'cited_by', 2);

SELECT _doc_id, title FROM rpq('cited_by/cited_by', 1);

-- Apache AGE compatible graph query (openCypher)

SELECT * FROM create_graph('social');

SELECT * FROM cypher('social', $$

    CREATE (a:Person {name: 'Alice', age: 30})-[:KNOWS]->(b:Person {name: 'Bob', age: 25})

    RETURN a.name, b.name

$$) AS (a_name agtype, b_name agtype);

SELECT * FROM cypher('social', $$

    MATCH (p:Person)-[:KNOWS]->(friend:Person)

    WHERE p.age > 25

    RETURN p.name, friend.name, p.age

    ORDER BY p.name

$$) AS (name agtype, friend agtype, age agtype);

-- Geospatial: R*Tree spatial index with Haversine distance

CREATE TABLE restaurants (

    id SERIAL PRIMARY KEY,

    name TEXT NOT NULL,

    cuisine TEXT NOT NULL,

    location POINT

);

CREATE INDEX idx_loc ON restaurants USING rtree (location);

SELECT name, ROUND(ST_Distance(location, POINT(-73.9857, 40.7484)), 0) AS dist_m

FROM restaurants

WHERE spatial_within(location, POINT(-73.9857, 40.7484), 5000)

ORDER BY dist_m;

-- Spatial + text + vector fusion

SELECT name, _score FROM restaurants

WHERE fuse_log_odds(

    text_match(description, 'pizza'),

    spatial_within(location, POINT(-73.9969, 40.7306), 3000),

    knn_match(embedding, $1, 5)

) ORDER BY _score DESC;

-- Text analysis: custom analyzer with stemming

SELECT * FROM create_analyzer('english_stem', '{

    "tokenizer": {"type": "standard"},

    "token_filters": [{"type": "lowercase"}, {"type": "stop", "language": "english"},

                      {"type": "porter_stem"}],

    "char_filters": []}');

SELECT * FROM list_analyzers();

-- Foreign Data Wrapper with Hive partitioning

CREATE SERVER warehouse FOREIGN DATA WRAPPER duckdb_fdw;

CREATE FOREIGN TABLE sales (

    id INTEGER, name TEXT, amount INTEGER,

    year INTEGER, month INTEGER

) SERVER warehouse OPTIONS (

    source '/data/sales/**/*.parquet',

    hive_partitioning 'true'

);

-- Predicate pushdown: DuckDB prunes partitions at source

SELECT name, SUM(amount) FROM sales

WHERE year IN (2024, 2025) AND month > 6

GROUP BY name ORDER BY SUM(amount) DESC;

-- Full query pushdown: entire query delegated to DuckDB

-- (JOINs, window functions, subqueries all execute in DuckDB)

SELECT pickup_zone, COUNT(*) AS trips,

       AVG(total_amount) AS avg_total

FROM taxi_trips t

JOIN taxi_zones z ON t.pu_location_id = z.location_id

GROUP BY pickup_zone ORDER BY trips DESC LIMIT 10;

```

## Architecture

```mermaid

graph TD

    SQL[SQL Parser
pglast] --> Compiler[SQL Compiler]

    QB[QueryBuilder
Fluent API] --> Operators

    Compiler --> Optimizer[Query Optimizer]

    Optimizer --> Operators[Operator Tree]

    Operators --> Executor[Plan Executor]

    Executor --> PAR[Parallel Executor
ThreadPool]

    Operators --> Cypher[Cypher Compiler
openCypher]

    PAR --> DS[Document Store
SQLite]

    PAR --> II[Inverted Index
SQLite + Analyzer]

    PAR --> VI["Vector Index
IVF"]

    PAR --> SI[Spatial Index
R*Tree]

    PAR --> GS[Graph Store
SQLite
Named Graphs]

    subgraph Scoring ["Scoring (bayesian-bm25)"]

        BM25[BM25]

        BBFS[Bayesian BM25]

        VS[Vector Scorer]

    end

    subgraph Fusion ["Fusion (bayesian-bm25)"]

        LO[Log-Odds]

        PB[Probabilistic Boolean]

    end

    Operators --> Scoring

    Operators --> Fusion

```

### Package Structure

```

uqa/

  core/           PostingList, types, hierarchical documents, functors

  analysis/       Text analysis pipeline: CharFilter, Tokenizer, TokenFilter, Analyzer, dual index/search analyzers

  storage/        Backend-agnostic ABCs with SQLite-backed stores: documents, inverted index, vectors (IVF), spatial (R*Tree), graph

  operators/      Operator algebra (boolean, primitive, hybrid, aggregation (count/sum/avg/min/max/quantile),

                  hierarchical (with cost estimation), sparse, multi-field, attention fusion,

                  learned fusion, multi-stage, deep fusion (ResNet/GNN/CNN/DenseNet),

                  deep learning (training pipeline, PyTorch GPU backend))

  scoring/        BM25, Bayesian BM25, VectorScorer, WAND/BlockMaxWAND, calibration,

                  parameter learning, external prior, multi-field, fusion WAND (via bayesian-bm25),

                  adaptive WAND, bound tightness

  fusion/         Log-odds conjunction (fuse + fuse_mean), probabilistic boolean, attention fusion,

                  learned fusion, query features (via bayesian-bm25), adaptive fusion

  graph/          GraphStore, traversal, pattern matching, RPQ, bounded RPQ, weighted paths,

                  centrality (PageRank, HITS, betweenness), cross-paradigm, indexes,

                  subgraph index, incremental matching, temporal filter/traverse/pattern,

                  delta/versioned store, message passing, embeddings, named graphs,

                  property indexes, join operators, RPQ optimizer, pattern negation,

                  configurable graph scores (DEFAULT_GRAPH_SCORE)

    cypher/       openCypher lexer, parser, AST, posting-list-based compiler

  fdw/            Foreign Data Wrappers: DuckDB (Parquet/CSV/JSON), Arrow Flight SQL, Hive partitioning, full query pushdown

  joins/          Hash, sort-merge, index, graph, cross-paradigm, similarity joins,

                  semi-join, anti-join

  execution/      Volcano iterator engine: Apache Arrow columnar batches, vectorized operators, disk spilling

  planner/        Cost model, cardinality estimator, optimizer, DPccp join enumerator, parallel executor,

                  information-theoretic bounds, graph cost model

  search/         Search result highlighting, snippet extraction, FTS query term extraction

  sql/            SQL compiler (pglast), expression evaluator, FTS query parser, table DDL/DML

  api/            Fluent QueryBuilder

  tests/          2880 tests across 84 test files

benchmarks/       309 pytest-benchmark tests across 15 files (posting list, storage, compiler,

                  execution, planner, scoring, graph, graph centrality, end-to-end SQL,

                  calibration, multi-field, external prior, advanced scoring, advanced graph,

                  named graphs)

```

## Key Features

### SQL Interface

| Category | Syntax |

|----------|--------|

| DDL | `CREATE TABLE [IF NOT EXISTS]`, `CREATE TEMPORARY TABLE`, `DROP TABLE [IF EXISTS]`, `CREATE TABLE AS SELECT`, `ALTER TABLE` (ADD/DROP/RENAME COLUMN, SET/DROP DEFAULT, SET/DROP NOT NULL, ALTER TYPE USING, ADD CONSTRAINT), `TRUNCATE TABLE`, `CREATE INDEX`, `DROP INDEX`, `CREATE SEQUENCE`/`NEXTVAL`/`CURRVAL`/`SETVAL`, `ALTER SEQUENCE`, `CREATE SCHEMA`/`DROP SCHEMA [CASCADE]`, `TABLE name` |

| FDW | `CREATE SERVER ... FOREIGN DATA WRAPPER`, `CREATE FOREIGN TABLE ... SERVER ... OPTIONS (...)`, `DROP SERVER`, `DROP FOREIGN TABLE`, Hive partitioning (`hive_partitioning` option), predicate pushdown (`=`, `!=`, `<`, `>`, `IN`, `LIKE`, `ILIKE`, `BETWEEN`), full query pushdown (JOINs, aggregates, window functions, subqueries), mixed foreign-local query optimization (local table shipping), DuckDB FDW (Parquet/CSV/JSON), Arrow Flight SQL FDW |

| Constraints | `PRIMARY KEY`, `NOT NULL`, `DEFAULT` (literals and SQL functions like `CURRENT_TIMESTAMP`), `UNIQUE`, `CHECK`, `FOREIGN KEY` (with insert/update/delete validation) |

| DML | `INSERT INTO ... VALUES`, `INSERT INTO ... SELECT`, `INSERT ... ON CONFLICT DO NOTHING/UPDATE`, `INSERT ... RETURNING`, `UPDATE ... SET ... WHERE [RETURNING]`, `UPDATE ... FROM` (join), `DELETE FROM ... WHERE [RETURNING]`, `DELETE ... USING` (join) |

| DQL | `SELECT [DISTINCT] ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY [NULLS FIRST/LAST] ... LIMIT ... OFFSET`, `FETCH FIRST n ROWS ONLY`, standalone `VALUES` |

| Joins | `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, `FULL OUTER JOIN`, `CROSS JOIN` with equality and non-equality `ON` conditions, `LATERAL` subquery |

| Set Ops | `UNION [ALL]`, `INTERSECT [ALL]`, `EXCEPT [ALL]` with chaining |

| Subqueries | `IN (SELECT ...)`, `EXISTS (SELECT ...)`, scalar subqueries, correlated subqueries, derived tables (`FROM (SELECT ...) AS alias`), `LATERAL` |

| CTEs | `WITH name AS (SELECT ...)`, `WITH RECURSIVE` |

| Views | `CREATE VIEW`, `DROP VIEW` |

| Window | `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `NTILE`, `LAG`, `LEAD`, `NTH_VALUE`, `PERCENT_RANK`, `CUME_DIST`, aggregates `OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE BETWEEN ...)`, `WINDOW w AS (...)`, `FILTER (WHERE ...)` on window aggregates |

| Aggregates | `COUNT [DISTINCT]`, `SUM`, `AVG`, `MIN`, `MAX`, `STRING_AGG`, `ARRAY_AGG`, `BOOL_AND`/`EVERY`, `BOOL_OR`, `STDDEV`/`VARIANCE`, `PERCENTILE_CONT/DISC`, `MODE`, `JSON_OBJECT_AGG`, `CORR`, `COVAR_POP/SAMP`, `REGR_*` (10 functions), `deep_learn(...)`, `FILTER (WHERE ...)`, `ORDER BY` within aggregate |

| Types | `INTEGER`, `BIGINT`, `SERIAL`, `TEXT`, `VARCHAR`, `REAL`, `FLOAT`, `DOUBLE PRECISION`, `NUMERIC(p,s)`, `BOOLEAN`, `DATE`, `TIME`, `TIMESTAMP`, `TIMESTAMPTZ`, `INTERVAL`, `JSON`/`JSONB`, `UUID`, `BYTEA`, `INTEGER[]` (arrays), `VECTOR(N)`, `POINT` |

| Date/Time | `EXTRACT`, `DATE_TRUNC`, `DATE_PART`, `NOW()`, `CURRENT_DATE`, `CURRENT_TIMESTAMP`, `CURRENT_TIME`, `CLOCK_TIMESTAMP`, `TIMEOFDAY`, `AGE`, `TO_CHAR`, `TO_DATE`, `TO_TIMESTAMP`, `MAKE_DATE`, `MAKE_TIMESTAMP`, `MAKE_INTERVAL`, `TO_NUMBER`, `OVERLAPS`, `ISFINITE` |

| JSON | `->`, `->>`, `#>`, `#>>` operators, `@>` / `<@` containment, `?` / `?|` / `?&` key existence, `JSONB_SET`, `JSONB_STRIP_NULLS`, `JSON_BUILD_OBJECT`, `JSON_BUILD_ARRAY`, `JSON_OBJECT_KEYS`, `JSON_EXTRACT_PATH`, `JSON_TYPEOF`, `JSON_AGG`, `::jsonb` cast |

| Table Funcs | `GENERATE_SERIES`, `UNNEST`, `REGEXP_SPLIT_TO_TABLE`, `JSON_EACH`/`JSON_EACH_TEXT`, `JSON_ARRAY_ELEMENTS`/`JSON_ARRAY_ELEMENTS_TEXT` |

| FTS | `column @@ 'query'` full-text search operator with query string mini-language: bare terms, `"phrases"`, `field:term`, `field:[vector]`, `AND`/`OR`/`NOT`, implicit AND, parenthesized grouping, hybrid text+vector fusion, `uqa_highlight()` result highlighting, `uqa_facets()` faceted search |

| Functions | 90+ scalar functions: string (`CONCAT_WS`, `POSITION`, `LPAD`, `REVERSE`, `MD5`, `OVERLAY`, `REGEXP_MATCH`, `ENCODE`/`DECODE`, ...), math (`POWER`, `SQRT`, `LN`, `CBRT`, `GCD`, `LCM`, `MIN_SCALE`, `TRIM_SCALE`, trig, ...), conditional (`GREATEST`, `LEAST`, `NULLIF`) |

| Prepared | `PREPARE name AS ...`, `EXECUTE name(params)`, `DEALLOCATE name` |

| Utility | `EXPLAIN SELECT ...`, `ANALYZE [table]` |

| Transactions | `BEGIN`, `COMMIT`, `ROLLBACK`, `SAVEPOINT`, `RELEASE SAVEPOINT`, `ROLLBACK TO SAVEPOINT` |

| Session | `SET`/`SHOW`/`RESET`/`RESET ALL`, `DISCARD ALL`, `SET search_path`, `SET LOCAL` |

| System | `information_schema.columns`, `pg_catalog.pg_tables`, `pg_catalog.pg_views`, `pg_catalog.pg_indexes`, `pg_catalog.pg_type` |

### Extended WHERE Functions

| Function | Description |

|----------|-------------|

| `column @@ 'query'` | Full-text search operator with query string mini-language (boolean, phrase, field targeting, hybrid text+vector) |

| `text_match(field, 'query')` | Full-text search with BM25 scoring |

| `bayesian_match(field, 'query')` | Bayesian BM25 — calibrated P(relevant) in [0,1] |

| `knn_match(field, vector, k)` | K-nearest neighbor vector search (vector as `ARRAY[...]` or `$N`) |

| `traverse_match(start, 'label', hops)` | Graph reachability as a scored signal |

| `path_filter(path, value)` | Hierarchical equality filter (any-match on arrays) |

| `path_filter(path, op, value)` | Hierarchical comparison filter (`>`, `<`, `>=`, `<=`, `!=`) |

| `spatial_within(field, POINT(x,y), dist)` | Geospatial range query (R*Tree + Haversine) |

| `sparse_threshold(signal, threshold)` | ReLU thresholding: max(0, score - threshold) |

| `multi_field_match(f1, f2, ..., query)` | Multi-field Bayesian BM25 with log-odds fusion |

| `bayesian_match_with_prior(f, q, pf, mode)` | Bayesian BM25 with external prior (recency/authority) |

| `temporal_traverse(start, lbl, hops, ts)` | Time-aware graph traversal |

| `message_passing(k, agg, property)` | GNN k-layer neighbor aggregation |

| `graph_embedding(dims, k)` | Structural graph embeddings |

| `vector_exclude(f, pos, neg, k, theta)` | Vector exclusion: positive minus negative similarity |

| `pagerank([damping[, iter[, tol]]][, 'graph'])` | PageRank centrality scoring |

| `hits([iter[, tol]][, 'graph'])` | HITS hub/authority scoring |

| `betweenness(['graph'])` | Betweenness centrality (Brandes) |

| `weighted_rpq('expr', start, 'prop'[, 'agg'[, threshold]])` | Weighted RPQ with aggregate predicates |

### Fusion Meta-Functions

| Function | Description |

|----------|-------------|

| `fuse_log_odds(sig1, sig2, ...[, alpha][, 'relu'\|'swish'])` | Log-odds conjunction with optional gating |

| `fuse_prob_and(sig1, sig2, ...)` | Probabilistic AND: P = prod(P_i) |

| `fuse_prob_or(sig1, sig2, ...)` | Probabilistic OR: P = 1 - prod(1 - P_i) |

| `fuse_prob_not(signal)` | Probabilistic NOT: P = 1 - P_signal |

| `fuse_attention(sig1, sig2, ...)` | Attention-weighted log-odds fusion |

| `fuse_learned(sig1, sig2, ...)` | Learned-weight log-odds fusion |

| `staged_retrieval(sig1, k1, sig2, k2, ...)` | Multi-stage cascading retrieval pipeline |

| `progressive_fusion(sig1, sig2, k1, sig3, k2[, alpha][, 'gating'])` | Progressive multi-stage WAND fusion |

| `deep_fusion(layer(...), propagate(...), convolve(...), ...[, gating])` | Multi-layer Bayesian fusion (ResNet + GNN + CNN) |

### Deep Fusion Layer Functions

Used inside `deep_fusion()` to compose neural network pipelines:

| Function | Description |

|----------|-------------|

| `layer(sig1, sig2, ...)` | Signal layer: log-odds conjunction with residual connection (ResNet) |

| `propagate('label', 'agg'[, 'dir'])` | Graph propagation: spread scores through edges (GNN) |

| `convolve('label', ARRAY[w...][, 'dir'])` | Spatial convolution: weighted multi-hop BFS aggregation (CNN) |

| `pool('label', 'method', size[, 'dir'])` | Spatial downsampling via greedy BFS partitioning |

| `dense(ARRAY[W], ARRAY[b], output_channels => N, input_channels => M)` | Fully connected layer |

| `flatten()` | Collapse spatial nodes into a single vector |

| `global_pool('avg'\|'max'\|'avg_max')` | Channel-preserving spatial reduction (alternative to flatten) |

| `softmax()` | Classification head (numerically stable) |

| `batch_norm([epsilon => 1e-5])` | Per-channel normalization across nodes |

| `dropout(p)` | Inference-mode scaling by (1 - p) |

| `attention(n_heads => N, mode => 'content'\|'random_qk'\|'learned_v')` | Self-attention: context-dependent PoE (Theorem 8.3) |

| `model('name', $1)` | Load trained model and create full inference pipeline (embed + conv + pool + dense + softmax) |

| `embed(vector, in_channels => C, grid_h => H, grid_w => W)` | Inject raw embedding vector into channel map |

### Deep Learning Functions

| Function | Description |

|----------|-------------|

| `deep_learn('model', label, embedding, 'edge_label', layers...[, gating][, lambda][, l1_ratio][, prune_ratio])` | SELECT aggregate: train a CNN classifier analytically (ridge regression, no backpropagation). Optional L1 regularization and magnitude pruning. Layers include `convolve(n_channels => N[, init => 'kaiming'\|'orthogonal'\|'gabor'\|'kmeans'])`, `pool()`, `flatten()`, `global_pool()`, `dense()`, `softmax()`, `attention()`. |

| `deep_predict('model', embedding)` | Per-row scalar: inference with trained model, returns class probabilities |

| `build_grid_graph('table', rows, cols, 'label')` | FROM-clause: construct 4-connected grid graph for spatial convolution |

### SELECT Spatial Functions

| Function | Description |

|----------|-------------|

| `ST_Distance(point1, point2)` | Haversine great-circle distance in meters |

| `ST_Within(point1, point2, dist)` | Distance predicate (boolean) |

| `ST_DWithin(point1, point2, dist)` | Alias for ST_Within |

| `POINT(x, y)` | Construct a POINT value (longitude, latitude) |

### SELECT Scalar Functions

| Function | Description |

|----------|-------------|

| `path_agg(path, func)` | Per-row nested array aggregation (`sum`, `count`, `avg`, `min`, `max`) |

| `path_value(path)` | Access nested field value by dot-path |

| `uqa_highlight(col, query [, start, end [, frags, size]])` | Highlight matched terms with tags (stemming-aware) |

| `uqa_facets(col [, col2, ...])` | Facet counts over search results (`facet_value \| facet_count`) |

| `deep_predict('model', embedding)` | Inference with trained model (class probabilities) |

### FROM-Clause Table Functions

| Function | Description |

|----------|-------------|

| `traverse(start, 'label', hops)` | BFS graph traversal |

| `rpq('path_expr', start)` | Regular path query (NFA simulation) |

| `text_search('query', 'field', 'table')` | Table-scoped full-text search |

| `generate_series(start, stop[, step])` | Generate a series of values |

| `unnest(array)` | Expand an array to a set of rows |

| `regexp_split_to_table(str, pattern)` | Split string by regex into rows |

| `json_each(json)` / `json_each_text(json)` | Expand JSON object to key/value rows |

| `json_array_elements(json)` | Expand JSON array to a set of rows |

| `pagerank([damping][, 'table_or_graph'])` | PageRank centrality as table source |

| `hits([iter][, 'table_or_graph'])` | HITS hub/authority as table source |

| `betweenness(['table_or_graph'])` | Betweenness centrality as table source |

| `graph_add_vertex(id, 'label', 'table'[, 'props'])` | Add graph vertex to table's graph store |

| `graph_add_edge(eid, src, tgt, 'label', 'table'[, 'props'])` | Add graph edge to table's graph store |

| `create_graph('name')` | Create a named graph namespace |

| `drop_graph('name')` | Drop a named graph |

| `cypher('graph', $$ query $$) AS (cols)` | Execute openCypher query on a named graph |

| `create_analyzer('name', 'config')` | Create a custom text analyzer (JSON config) |

| `drop_analyzer('name')` | Drop a custom text analyzer |

| `set_table_analyzer('tbl', 'field', 'name'[, 'phase'])` | Assign index/search analyzer to a field |

| `list_analyzers()` | List all registered analyzers |

| `build_grid_graph('table', rows, cols, 'label')` | Construct 4-connected grid graph for spatial convolution |

### Persistence

All data is persisted to SQLite when an engine is created with `db_path`:

| Store | SQLite Table | Description |

|-------|-------------|-------------|

| Documents | `_data_{table}` | Typed columns per table |

| Inverted Index | `_inverted_{table}_{field}` | Per-table per-field posting lists |

| Field Stats | `_field_stats_{table}` | Per-table field-level statistics (BM25) |

| Doc Lengths | `_doc_lengths_{table}` | Per-table per-document token lengths (BM25) |

| Vectors | `_ivf_centroids_{table}_{field}`, `_ivf_lists_{table}_{field}` | IVF index via `CREATE INDEX ... USING hnsw` or `USING ivf`; centroids in memory, posting lists in SQLite |

| Spatial | `_rtree_{table}_{field}` | R*Tree virtual table for POINT columns; created via `CREATE INDEX ... USING rtree` |

| Graph | `_graph_vertices_{table}`, `_graph_edges_{table}` | Per-table adjacency-indexed graph with vertex labels |

| Named Graphs | `_graph_catalog_{table}`, `_graph_membership_{table}` | Per-graph partitioned adjacency with catalog and membership tables |

| GIN Indexes | Catalog entry + `fts_fields` | `CREATE INDEX ... USING gin`; controls which columns are indexed in the inverted index |

| B-tree Indexes | SQLite indexes on `_data_{table}` | `CREATE INDEX` support |

| Analyzers | `_analyzers` | Custom text analyzer configurations |

| Field Analyzers | `_table_field_analyzers` | Per-field index/search analyzer assignments |

| Foreign Servers | `_foreign_servers` | FDW server definitions (type, connection options) |

| Foreign Tables | `_foreign_tables` | FDW table definitions (columns, source, options) |

| Path Indexes | `_path_indexes` | Pre-computed label-sequence RPQ accelerators |

| Statistics | `_column_stats` | Per-table histograms and MCVs for optimizer |

| Models | `_models` | Trained deep learning model configurations (JSON) |

### Query Optimizer

- Algebraic simplification (idempotent intersection/union, absorption law, empty elimination)

- Cost-based optimization with equi-depth histograms and Most Common Values (MCV)

- **DPccp join order optimization** (Moerkotte & Neumann, 2006) — O(3^n) dynamic programming over connected subgraph complement pairs; produces optimal bushy join trees for INNER JOIN chains with 2+ relations; greedy fallback for 16+ relations; bitmask DP table with bytearray connectivity lookup and incremental connected subgraph enumeration

- Filter pushdown into intersections (recursive through nested IntersectOperators)

- Vector threshold merge with floating-point tolerance (same query vector, `np.allclose`)

- Intersect operand reordering by execution cost (cheapest first)

- Fusion signal reordering by cost (cheapest first)

- Early termination in IntersectOperator (skip remaining operands when accumulator is empty)

- Predicate-aware cardinality damping (same-column vs different-column correlation)

- Join-algorithm-aware DPccp cost model (index join vs hash join threshold)

- R*Tree spatial index scan for POINT column range queries

- GIN index for explicit full-text search column management (`CREATE INDEX ... USING gin`)

- B-tree index scan substitution (replace full scans when profitable)

- FDW predicate pushdown (comparison, IN, LIKE, ILIKE, BETWEEN pushed to DuckDB/Arrow Flight SQL for Hive partition pruning)

- FDW full query pushdown (same-server queries delegated entirely to DuckDB/Arrow Flight SQL via AST deparsing)

- FDW mixed foreign-local optimization (small local tables shipped to DuckDB for in-process JOIN execution)

- Cross-paradigm cardinality estimation for text, vector, graph, fusion, temporal, and GNN operators

- Edge property filter pushdown into graph pattern constraints

- Join-pattern fusion (merge intersected pattern matches with shared variables)

- Cross-paradigm join cost models (text similarity, vector similarity, graph, hybrid joins)

- Threshold-aware vector selectivity estimation (4-tier threshold buckets)

- Temporal graph cardinality correction with timestamp/range selectivity

- Path index acceleration for simple Concat-of-Labels RPQ expressions and Cypher MATCH patterns

- CTE inlining for single-reference non-recursive CTEs

- Predicate pushdown into views and derived tables

- Implicit cross join reordering via DPccp when equijoin predicates exist in WHERE

- Filter pushdown into graph traverse operators (vertex predicate BFS pruning)

- Graph-aware fusion signal reordering with per-graph cost model

- Named graph scoped statistics (degree distribution, label degree, vertex label counts)

- Information-theoretic cardinality lower bounds (entropy-based, histogram-aware)

- Hierarchical operator cost estimation (PathFilter, PathProject, PathUnnest, PathAggregate)

- Negation-aware pattern match cost estimation

### Disk Spilling

Blocking operators (sort, hash-aggregate, distinct) spill intermediate data to temporary Arrow IPC files when the input exceeds `spill_threshold` rows, bounding memory usage for large queries:

| Operator | Strategy |

|----------|----------|

| `SortOp` | External merge sort — sorted runs spilled to disk, merged via k-way min-heap |

| `HashAggOp` | Grace hash — rows partitioned into 16 on-disk files, each aggregated independently |

| `DistinctOp` | Hash partition dedup — same partitioning, per-partition deduplication |

```python

engine = Engine(db_path="my.db", spill_threshold=100000)  # spill after 100K rows

engine = Engine(spill_threshold=0)                        # disable (default)

```

### Parallel Execution

Independent operator branches (Union, Intersect, Fusion signals) execute concurrently via `ThreadPoolExecutor`. Configure with `parallel_workers` parameter:

```python

engine = Engine(db_path="my.db", parallel_workers=4)  # default: 4

engine = Engine(parallel_workers=0)                   # disable parallelism

```

## Requirements

- Python 3.12+

- numpy >= 1.26

- pyarrow >= 10.0

- bayesian-bm25 >= 0.8.1

- duckdb >= 1.0

- pglast >= 7.0

- prompt-toolkit >= 3.0

- pygments >= 2.17

## Installation

```bash

pip install uqa

# From source

pip install -e .

# With development dependencies

pip install -e ".[dev]"

```

## Usage

### Interactive SQL Shell

```bash

usql                 # In-memory

usql --db mydata.db  # Persistent database

```

Shell commands:

| Command | Description |

|---------|-------------|

| `\dt` | List tables (regular and foreign) |

| `\d ` | Describe table schema (regular or foreign) |

| `\di` | List GIN-indexed fields per table |

| `\dF` | List foreign tables (server, source, options) |

| `\dS` | List foreign servers (type, connection options) |

| `\dg` | List named graphs (vertex/edge counts) |

| `\ds ` | Show column statistics (requires `ANALYZE` first) |

| `\x` | Toggle expanded (vertical) display |

| `\o [file]` | Redirect output to file (no arg restores stdout) |

| `\timing` | Toggle query timing display |

| `\reset` | Reset the engine |

| `\?` | Show help |

| `\q` | Quit |

### Python API

```python

from uqa.engine import Engine

# In-memory engine

engine = Engine()

# Persistent engine (SQLite-backed)

engine = Engine(db_path="research.db")

engine.sql("""

    CREATE TABLE papers (

        id SERIAL PRIMARY KEY,

        title TEXT NOT NULL,

        year INTEGER NOT NULL,

        citations INTEGER DEFAULT 0

    )

""")

engine.sql("""INSERT INTO papers (title, year, citations) VALUES

    ('attention is all you need', 2017, 90000),

    ('bert pre-training', 2019, 75000)

""")

engine.sql("ANALYZE papers")

result = engine.sql("""

    SELECT title, _score FROM papers

    WHERE text_match(title, 'attention') ORDER BY _score DESC

""")

print(result)

# Retrieve a document by ID

doc = engine.get_document(1, table="papers")

engine.close()  # or use: with Engine(db_path="...") as engine:

```

### Fluent QueryBuilder API

```python

from uqa.core.types import Equals, GreaterThanOrEqual

# Text search with scoring

result = (

    engine.query(table="papers")

    .term("attention", field="title")

    .score_bayesian_bm25("attention")

    .execute()

)

# Nested data: filter + aggregate

result = (

    engine.query(table="orders")

    .filter("shipping.city", Equals("Seoul"))

    .path_aggregate("items.price", "sum")

    .execute()

)

# Graph traversal + aggregation

team = engine.query(table="employees").traverse(2, "manages", max_hops=2)

total = team.vertex_aggregate("salary", "sum")

# Multi-field search with per-field weights

result = (

    engine.query(table="papers")

    .score_multi_field_bayesian("attention", ["title", "abstract"], [2.0, 1.0])

    .execute()

)

# Multi-stage pipeline: broad recall -> re-rank

s1 = engine.query(table="papers").score_bayesian_bm25("transformer", "title")

s2 = engine.query(table="papers").score_bayesian_bm25("attention", "abstract")

result = engine.query(table="papers").multi_stage([(s1, 50), (s2, 10)]).execute()

# Temporal graph traversal

result = engine.query(table="social").temporal_traverse(

    1, "knows", max_hops=2, timestamp=1700000000.0

).execute()

# Facets over all documents

facets = engine.query(table="papers").facet("status")

```

### Arrow / Parquet Export

```python

# SQL result -> Arrow Table (zero-copy from execution engine)

result = engine.sql("SELECT title, year FROM papers ORDER BY year")

arrow_table = result.to_arrow()

# SQL result -> Parquet file

result.to_parquet("papers.parquet")

# Fluent API -> Arrow Table

arrow_table = (

    engine.query(table="papers")

    .term("attention", field="title")

    .score_bm25("attention")

    .execute_arrow()

)

# Fluent API -> Parquet file

engine.query(table="papers").term("attention", field="title").execute_parquet("results.parquet")

```

## Examples

### Quickstart

```bash

python examples/quickstart.py      # Hybrid search in under 30 lines

```

### Fluent API (`examples/fluent/`)

```bash

python examples/fluent/text_search.py         # BM25, Bayesian BM25, boolean, facets

python examples/fluent/vector_and_hybrid.py   # KNN, hybrid, vector exclusion, fusion

python examples/fluent/graph.py               # Traversal, RPQ, pattern matching, indexes

python examples/fluent/hierarchical.py        # Nested data, path filters, aggregation

python examples/fluent/multi_paradigm.py      # Multi-signal fusion, graph analytics, query plans

python examples/fluent/scoring.py             # Bayesian BM25, sparse threshold, multi-field, priors

python examples/fluent/fusion_advanced.py     # Attention/learned fusion, multi-stage pipelines

python examples/fluent/graph_advanced.py      # Temporal traversal, message passing, path index, delta

python examples/fluent/graph_centrality.py    # PageRank, HITS, betweenness, bounded RPQ, weighted paths, indexing

python examples/fluent/named_graphs.py        # Named graphs, graph algebra, property indexes, functors, adaptive fusion

python examples/fluent/analysis.py            # Text analysis pipeline, tokenizers, filters, stemming

python examples/fluent/export.py              # Arrow/Parquet export from fluent queries

```

### SQL (`examples/sql/`)

```bash 
python examples/sql/basics.py 
python examples/sql/functions.py 
python examples/sql/graph.py 
python examples/sql/fts_match.py 
python examples/sql/fusion.py 
python examples/sql/joins_and_subqueries.py 
python examples/sql/analytics.py 
python examples/sql/analysis.py 
python examples/sql/export.py 
python examples/sql/spatial.py 
python examples/sql/synonyms.py 
python examples/sql/scoring_advanced.py 
python examples/sql/calibration.py 
python examples/sql/temporal_graph.py 
python examples/sql/graph_delta.py 
python examples/sql/fdw.py 
python examples/sql/nyc_taxi.py 
python examples/sql/fusion_gating.py 
python examples/sql/graph_centrality.py 
python examples/sql/named_graphs.py 
```

# DDL, DML, SELECT, CTE, window, transactions, views # text_match, knn_match, path_agg, path_value, path_filter # FROM traverse/rpq, aggregates, GROUP BY, WHERE # @@ operator: boolean, phrase, field, hybrid text+vector # fuse_log_odds, fuse_prob_and/or/not, EXPLAIN # JOINs, derived tables, set operations, recursive CTE # Aggregates, window functions, JSON, date/time, UPSERT # Text analyzers via SQL: create, list, drop, persistence # Arrow/Parquet export from SQL queries # Geospatial: POINT, R*Tree, spatial_within, ST_Distance, fusion # Synonym search: dual analyzers, index/search-time expansion # Sparse threshold, multi-field, attention/learned fusion, staged retrieval # ECE, Brier, reliability diagram, parameter learning # Temporal traversal, message passing, graph embeddings # Delta operations, path index invalidation, rollback # Foreign Data Wrappers, Hive partitioning, predicate pushdown # NYC Taxi analytics, remote Parquet, full query pushdown, spatial JOIN # ReLU/Swish gating, alpha+gating, progressive fusion # PageRank, HITS, betweenness, bounded RPQ, weighted RPQ via SQL # Named graphs via SQL: traverse, RPQ, Cypher, centrality, algebra

### Showcase (`examples/showcase/`)

```bash

python examples/showcase/knowledge_discovery.py  # Cross-paradigm unification: SQL + FTS + vector + graph + Cypher

python examples/showcase/calibration_matters.py  # Bayesian fusion vs naive scoring, calibration metrics

python examples/showcase/bayesian_neural.py      # Bayesian fusion IS a feedforward neural network (Paper 4)

python examples/showcase/deep_fusion_resnet.py   # Deep fusion as ResNet: hierarchical signal layers

python examples/showcase/deep_fusion_gnn.py      # Deep fusion as GNN: graph propagation layers

python examples/showcase/deep_fusion_cnn.py      # Deep fusion as CNN: spatial convolution over grids

python examples/showcase/deep_fusion_nn.py       # Full neural network: pool, dense, flatten, softmax, batch_norm, dropout

python examples/showcase/deep_learn_mnist.py     # MNIST training pipeline: deep_learn + deep_predict + deep_fusion(model())

python examples/showcase/deep_learn_tiny_imagenet.py  # Tiny ImageNet RGB: 50-class CNN training on 64x64 images

python examples/showcase/deep_learn_attention.py # Self-attention on MNIST: content, random_qk, learned_v modes

python examples/showcase/deep_learn_mnist_pruning.py  # Neural network pruning: elastic net + magnitude pruning

```

### Interactive Shell

```bash

usql                 # In-memory

usql --db mydata.db  # Persistent

```

## Tests

```bash

# Run all tests

python -m pytest uqa/tests/ -v

# Run a specific test file

python -m pytest uqa/tests/test_sql.py -v

# Run benchmarks (requires pytest-benchmark)

pip install -e ".[benchmark]"

python -m pytest benchmarks/ --benchmark-sort=name

```

## License

AGPL-3.0 — see [LICENSE](LICENSE).

## References

1. [A Unified Mathematical Framework for Query Algebras Across Heterogeneous Data Paradigms](docs/papers/1.%20A%20Unified%20Mathematical%20Framework%20for%20Query%20Algebras%20Across%20Heterogeneous%20Data%20Paradigms.pdf)

2. [Extending the Unified Mathematical Framework to Support Graph Data Structures](docs/papers/2.%20Extending%20the%20Unified%20Mathematical%20Framework%20to%20Support%20Graph%20Data%20Structures.pdf)

3. [Bayesian BM25 - A Probabilistic Framework for Hybrid Text and Vector Search](docs/papers/3.%20Bayesian%20BM25%20-%20A%20Probabilistic%20Framework%20for%20Hybrid%20Text%20and%20Vector%20Search.pdf)

4. [From Bayesian Inference to Neural Computation - The Analytical Emergence of Neural Network Structure from Probabilistic Relevance Estimation](docs/papers/4.%20From%20Bayesian%20Inference%20to%20Neural%20Computation%20-%20The%20Analytical%20Emergence%20of%20Neural%20Network%20Structure%20from%20Probabilistic%20Relevance%20Estimation.pdf)

5. [Vector Scores as Likelihood Ratios - Index-Derived Bayesian Calibration for Hybrid Search](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cognica-io/uqa

Awesome Lists containing this project

README