https://github.com/dont-rely-on-nulls/sakura
An Extended Relational Engine written in OCaml
https://github.com/dont-rely-on-nulls/sakura
ocaml relational-algebra relational-database
Last synced: 2 months ago
JSON representation
An Extended Relational Engine written in OCaml
- Host: GitHub
- URL: https://github.com/dont-rely-on-nulls/sakura
- Owner: dont-rely-on-nulls
- License: agpl-3.0
- Created: 2024-10-22T01:04:08.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2026-04-22T01:49:40.000Z (2 months ago)
- Last Synced: 2026-04-22T03:35:20.142Z (2 months ago)
- Topics: ocaml, relational-algebra, relational-database
- Language: OCaml
- Homepage: http://www.dontrelynulls.org/sakura/
- Size: 395 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 11
-
Metadata Files:
- Readme: README.org
- License: LICENSE
Awesome Lists containing this project
README
#+TITLE: Sakura
* About
Sakura is a semantic-relational algebra engine inspired by
E. F. Codd's RM/T and D. McGoveran's (as understood by Fabian Pascal)
SRDM. It models relations in both intension and extension, providing
their manipulation as first-class elements of an algebra.
* Motivation
We believe the relational model was not ever fully implemented. The
fundamental necessity of expressing intent and binding meaning
enclosed to a conceptual model in data is seldom achieved by any
self-claimed relational database vendor. The emergence of different
physically concerned paradigms in the industry demonstrate not only a
large scale amnesia about fundamentals, but also a lethal
manifestation of problems that should not exist otherwise. The
relational data model as an applied theory is an universally rich and
concrete approach to reality that bases itself greatly in
moderate-realism applied to mathematics, as described in =An
Aristotelian Realist Philosophy of Mathematics: Mathematics as the
Science of Quantity and Structure= by J. Franklin. We believe that
unleashing the full potential of the relational model as refined over
the years despite overwhelming disregard of the industry as a whole,
will promote efficiency to a new scale never seen, that was very much
so cut short by relational pretenders and theoretically unsound
approaches. Our goal is to eliminate the application-database
distinction, while maintaining a clear separation between a first
order predicate logic algebra engine and a higher order predicate
logic decision engine (application) via an embedded data sublanguage
with Sakura as a possible auxiliary runtime.
* Current State
** What Works Today
- Base finite relations: create, insert, retract, clear
- Built-in immutable/generated domains on database creation:
=integer=, =natural=, =rational=, =string=, =atom=, =term=
- Query DSL in =query_planner= and =repl=:
- base relation scan (using relation atom)
- =select=, =project=, =join=, =theta_join=, =sort=, =take=, =rename=
- =materialize= (plan-level, with caveats)
- Volcano-style iterator execution model (process-per-operator)
- Visual output modes in =visualizer= (table/tree/nested views)
- XML TCP server protocol commands: =QUERY=, =NEXT=, =CLOSE=
** Known Gaps / Reliability Risks
- Lack of constraints greater than 1OPs;
- =repl:start/0= is destructive: it calls =main:setup/0=, which recreates schema;
- =xml_server= =SCHEMA= path is currently fragile with constraints representation;
- Constraint/intension representation is not yet fully normalized across modules;
- Planner contains dual compilation styles (iterator path + relation path);
- Blocking operators (for example =sort=) over infinite relations can time out unless bounded with =take= first;
- There is currently no substantial automated test suite in-repo despite CI hooks;
See =docs/reliability_roadmap.org= for the detailed remediation plan.
* Architecture (High Level)
- =src/operations.erl= :: core storage/versioning and iterator primitives
- =src/constraint.erl= :: domain constraints, inference, validation
- =src/generators.erl= :: infinite relation generators
- =src/relational_operators.erl= :: pure relation-to-relation algebra operators
- =src/query_planner.erl= :: tuple DSL parsing/validation/execution planning
- =src/repl.erl= :: shell-facing query convenience API
- =src/xml_server.erl= :: TCP/XML query interface
- =src/visualizer.erl= :: renderers for result tuples
Data model records are defined in =include/operations.hrl=.
* Query DSL
Plans are Erlang tuples:
- base relation: =employees=
- selection: ={select, employees, fun(T) -> maps:get(age, T) > 30 end}=
- projection: ={project, employees, [name, age]}=
- equijoin: ={join, employees, departments, dept_id}=
- theta join: ={theta_join, Left, Right, Pred2}=
- sort: ={sort, Plan, Comparator2}=
- take: ={take, Plan, N}=
- rename: ={rename, old_attr, new_attr, Plan}=
* Running
** Compile
#+begin_src sh
rebar3 compile
#+end_src
** REPL
#+begin_src erlang
repl:start().
DB = repl:example_db().
repl:q(DB, employees).
repl:q(DB, {take, {sort, employees,
fun(A,B) -> maps:get(age,A) =< maps:get(age,B) end}, 3}).
#+end_src
** XML Server (manual)
Start server from Erlang shell, then send line commands over TCP:
- =QUERY =
- =NEXT =
- =CLOSE =
* Design Intent
Sakura aims to treat relations as algebraic manipulation units with both:
- extension (actual tuples, finite or generated)
- intension (schema + constraints + derivation meaning)
The current engine already exposes the extension side robustly for
many operations, despite the lack of constraints greater than 1OP.
The main active work is making intension handling fully consistent and
reliable across planners/operators/protocols.
* Roadmap and Docs
- =docs/reliability_roadmap.org= :: precise reliability + correctness roadmap
- =docs/code.org= :: code-oriented notes
- =docs/features.org= :: feature notes (may contain older planning context)
- =docs/constraints.org= :: constraint model notes
* References
- Codd, E. F. (1970), "A Relational Model of Data for Large Shared Data Banks"
- Codd, E. F. (1979), "Extending the Database Relational Model to Capture More Meaning"
* License
See =LICENSE=.